Skip to content
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
157 changes: 157 additions & 0 deletions .github/copilot/instructions/performance-monitoring.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
# Performance Monitoring Guide

## Overview
This guide covers performance measurement strategies for ctrld-sync, including how to efficiently measure impact, common bottlenecks, and testing approaches.

## Key Performance Metrics

### User-Facing Metrics
- **Cold start sync time**: First run without cache (baseline: ~10-30s for 23 folders)
- **Warm cache sync time**: Subsequent runs with valid cache (target: <5s)
- **API calls per sync**: Total API requests made (fewer is better, measure before/after)
- **Rules processed per second**: Throughput for large rule sets

### System Metrics
- **Cache hit rate**: (hits + validations) / total requests (target: >80% on warm runs)
- **Memory peak usage**: Maximum RSS during sync (baseline: <100MB for typical workloads)
- **API rate limit headroom**: Distance from 429 errors

## Critical Constraints

### API Rate Limits (Non-Negotiable)
The Control D API has strict rate limits. Existing safeguards:
- Serial processing (max_workers=1) to prevent 429 errors
- 60-second wait after deletions to prevent "zombie state"
- Conservative DELETE_WORKERS=3 for parallel deletes
- Batch size of 500 items per request (empirically chosen)

**Never increase parallelism without first implementing rate limit header parsing.**

### Batch Size Rationale
The 500-item batch size is not dynamic. It was chosen through production testing to stay under API limits. Rather than smart batching, focus on better retry logic (exponential backoff with jitter).

## Measurement Strategies

### Quick Synthetic Benchmarks
```bash
# Measure cache effectiveness (after implementation)
python main.py --dry-run # Should show cache stats

# Measure rule validation performance
pytest tests/test_security.py -v # Look for slow tests

# Profile memory usage
python -m memory_profiler main.py --dry-run
```

### Realistic User Journey Tests
```bash
# Cold start (clear cache first)
rm -rf ~/.cache/ctrld-sync/blocklists.json
time python main.py --dry-run

# Warm cache (run twice)
time python main.py --dry-run
time python main.py --dry-run # Second run should be faster
```

### CI Performance Tracking
- Monitor GitHub Actions workflow duration trends
- Track test suite execution time with `pytest --durations=10`
- Use CI artifacts to store timing data across runs

## Common Bottlenecks

### 1. Cold Start Downloads
**Symptom**: Slow first run
**Causes**:
- Downloading all blocklists from scratch
- No persistent cache or stale cache
**Solutions**:
- ✅ Persistent disk cache with ETag/Last-Modified (implemented)
- ✅ HTTP conditional requests (304 Not Modified) (implemented)
- Future: Parallel DNS validation (deferred due to complexity)

### 2. API Request Overhead
**Symptom**: High latency even with small rule sets
**Causes**:
- Too many small API calls
- No request batching
**Solutions**:
- Batch rule updates (500 items per request)
- Track API call counts to identify inefficiencies
- Connection pooling (httpx already configured)

### 3. Test Suite Duration
**Symptom**: Slow CI runs
**Causes**:
- Sequential test execution
- No dependency caching
**Solutions**:
- ✅ pytest-xdist for parallel execution (implemented)
- ✅ CI pip dependency caching (implemented)

## Performance Engineering Workflow

### 1. Identify Target
Review performance plan discussion and choose specific bottleneck.

### 2. Establish Baseline
Before making changes:
```bash
# Run tests and record timing
pytest --durations=10 > baseline-tests.txt

# Run actual sync and record metrics
python main.py --dry-run > baseline-sync.txt 2>&1
```

### 3. Implement Change
Make focused, minimal changes. Avoid premature optimization.

### 4. Measure Impact
After changes:
```bash
# Compare test timing
pytest --durations=10 > optimized-tests.txt
diff baseline-tests.txt optimized-tests.txt

# Compare sync performance
python main.py --dry-run > optimized-sync.txt 2>&1
diff baseline-sync.txt optimized-sync.txt
```

### 5. Validate Correctness
Ensure tests still pass:
```bash
pytest -n auto # Parallel test execution
```

## Success Criteria

### For This Optimization (API Call Tracking)
- ✅ API call counter added to global stats
- ✅ Summary output includes "API calls made"
- ✅ No regression in existing tests
- ✅ Non-intrusive implementation (minimal code changes)

### General Performance PR Requirements
- Clear before/after measurements
- No test regressions
- Documentation of trade-offs
- Reproducible test instructions

## Graceful Degradation

Performance optimizations must fail safely:
- Corrupted cache should trigger clean cold start, not crash
- Failed API calls should retry with exponential backoff
- Performance tracking failures should not block sync

## Future Optimization Targets

Based on maintainer priorities:
1. **Performance regression tests**: Automated benchmarks in CI
2. **Improved retry logic**: Exponential backoff with jitter
3. **Memory efficiency**: Streaming for 100k+ rule sets (low priority)
4. **Rate limit header parsing**: Required before any parallelization increase
23 changes: 22 additions & 1 deletion main.py
Original file line number Diff line number Diff line change
Expand Up @@ -495,6 +495,7 @@ def _api_client() -> httpx.Client:
# to enable fast cold-start syncs via conditional HTTP requests (304 Not Modified)
_disk_cache: Dict[str, Dict[str, Any]] = {} # Loaded from disk on startup
_cache_stats = {"hits": 0, "misses": 0, "validations": 0, "errors": 0}
_api_stats = {"control_d_api_calls": 0, "blocklist_fetches": 0}
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _api_stats dictionary is accessed concurrently from multiple threads without synchronization. Both _gh_get (called from warm_up_cache with ThreadPoolExecutor) and _api_get/_api_post/_api_delete (called from get_all_existing_rules with ThreadPoolExecutor max_workers=5, and from delete operations with DELETE_WORKERS=3) can increment these counters simultaneously. This creates a race condition where increments can be lost.

Dictionary operations like += are not atomic in Python - they involve read-modify-write cycles that can interleave, causing lost updates. For example, if two threads read the same value 5, both increment to 6, and both write 6, one increment is lost.

Consider using the existing _cache_lock (or creating a dedicated lock) to protect these counter updates, similar to how _cache_stats updates within _cache_lock protection (line 901, 931). Alternatively, use threading.Lock() or atomic counter classes from the threading module.

Copilot uses AI. Check for mistakes.


def get_cache_dir() -> Path:
Expand Down Expand Up @@ -821,18 +822,26 @@ def validate_folder_data(data: Dict[str, Any], url: str) -> bool:


def _api_get(client: httpx.Client, url: str) -> httpx.Response:
global _api_stats
_api_stats["control_d_api_calls"] += 1
return _retry_request(lambda: client.get(url))


def _api_delete(client: httpx.Client, url: str) -> httpx.Response:
global _api_stats
_api_stats["control_d_api_calls"] += 1
return _retry_request(lambda: client.delete(url))


def _api_post(client: httpx.Client, url: str, data: Dict) -> httpx.Response:
global _api_stats
_api_stats["control_d_api_calls"] += 1
return _retry_request(lambda: client.post(url, data=data))


def _api_post_form(client: httpx.Client, url: str, data: Dict) -> httpx.Response:
global _api_stats
_api_stats["control_d_api_calls"] += 1
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The increment of _api_stats["control_d_api_calls"] is not protected by any lock. This function is called concurrently from multiple ThreadPoolExecutor contexts, creating race conditions that can lead to lost counter updates.

Consider protecting this update with a lock to ensure thread-safe counter increments.

Copilot uses AI. Check for mistakes.
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The increment of _api_stats["control_d_api_calls"] is not protected by any lock. This function is called concurrently from multiple ThreadPoolExecutor contexts (e.g., get_all_existing_rules with max_workers=5 at line 1288, parallel deletions with DELETE_WORKERS=3 at line 1811). Without synchronization, concurrent increments can result in lost updates and inaccurate counter values.

Consider protecting this update with a lock to ensure thread-safe counter increments.

Suggested change
def _api_get(client: httpx.Client, url: str) -> httpx.Response:
global _api_stats
_api_stats["control_d_api_calls"] += 1
return _retry_request(lambda: client.get(url))
def _api_delete(client: httpx.Client, url: str) -> httpx.Response:
global _api_stats
_api_stats["control_d_api_calls"] += 1
return _retry_request(lambda: client.delete(url))
def _api_post(client: httpx.Client, url: str, data: Dict) -> httpx.Response:
global _api_stats
_api_stats["control_d_api_calls"] += 1
return _retry_request(lambda: client.post(url, data=data))
def _api_post_form(client: httpx.Client, url: str, data: Dict) -> httpx.Response:
global _api_stats
_api_stats["control_d_api_calls"] += 1
# Lock to ensure thread-safe updates to _api_stats in multithreaded contexts
_api_stats_lock = threading.Lock()
def _api_get(client: httpx.Client, url: str) -> httpx.Response:
global _api_stats
# Protect increment to avoid lost updates when called from ThreadPoolExecutor
with _api_stats_lock:
_api_stats["control_d_api_calls"] += 1
return _retry_request(lambda: client.get(url))
def _api_delete(client: httpx.Client, url: str) -> httpx.Response:
global _api_stats
# Same lock used across all API helper methods for consistent synchronization
with _api_stats_lock:
_api_stats["control_d_api_calls"] += 1
return _retry_request(lambda: client.delete(url))
def _api_post(client: httpx.Client, url: str, data: Dict) -> httpx.Response:
global _api_stats
with _api_stats_lock:
_api_stats["control_d_api_calls"] += 1
return _retry_request(lambda: client.post(url, data=data))
def _api_post_form(client: httpx.Client, url: str, data: Dict) -> httpx.Response:
global _api_stats
with _api_stats_lock:
_api_stats["control_d_api_calls"] += 1

Copilot uses AI. Check for mistakes.
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The increment of _api_stats["control_d_api_calls"] is not protected by any lock. This function is called concurrently from multiple ThreadPoolExecutor contexts, creating race conditions that can lead to lost counter updates.

Consider protecting this update with a lock to ensure thread-safe counter increments.

Suggested change
def _api_get(client: httpx.Client, url: str) -> httpx.Response:
global _api_stats
_api_stats["control_d_api_calls"] += 1
return _retry_request(lambda: client.get(url))
def _api_delete(client: httpx.Client, url: str) -> httpx.Response:
global _api_stats
_api_stats["control_d_api_calls"] += 1
return _retry_request(lambda: client.delete(url))
def _api_post(client: httpx.Client, url: str, data: Dict) -> httpx.Response:
global _api_stats
_api_stats["control_d_api_calls"] += 1
return _retry_request(lambda: client.post(url, data=data))
def _api_post_form(client: httpx.Client, url: str, data: Dict) -> httpx.Response:
global _api_stats
_api_stats["control_d_api_calls"] += 1
# Dedicated lock to ensure thread-safe updates of _api_stats counters.
# This prevents lost increments when API helpers are used from multiple threads.
_api_stats_lock = threading.Lock()
def _api_get(client: httpx.Client, url: str) -> httpx.Response:
global _api_stats
# Protect counter increment with a lock to avoid race conditions.
with _api_stats_lock:
_api_stats["control_d_api_calls"] += 1
return _retry_request(lambda: client.get(url))
def _api_delete(client: httpx.Client, url: str) -> httpx.Response:
global _api_stats
# Protect counter increment with a lock to avoid race conditions.
with _api_stats_lock:
_api_stats["control_d_api_calls"] += 1
return _retry_request(lambda: client.delete(url))
def _api_post(client: httpx.Client, url: str, data: Dict) -> httpx.Response:
global _api_stats
# Protect counter increment with a lock to avoid race conditions.
with _api_stats_lock:
_api_stats["control_d_api_calls"] += 1
return _retry_request(lambda: client.post(url, data=data))
def _api_post_form(client: httpx.Client, url: str, data: Dict) -> httpx.Response:
global _api_stats
# Protect counter increment with a lock to avoid race conditions.
with _api_stats_lock:
_api_stats["control_d_api_calls"] += 1

Copilot uses AI. Check for mistakes.
return _retry_request(
lambda: client.post(
url,
Expand Down Expand Up @@ -884,14 +893,17 @@ def _gh_get(url: str) -> Dict:

SECURITY: Validates data structure regardless of cache source
"""
global _cache_stats
global _cache_stats, _api_stats

# First check: Quick check without holding lock for long
with _cache_lock:
if url in _cache:
_cache_stats["hits"] += 1
return _cache[url]

# Track that we're about to make a blocklist fetch
_api_stats["blocklist_fetches"] += 1

# Check disk cache for conditional request headers
headers = {}
cached_entry = _disk_cache.get(url)
Expand Down Expand Up @@ -2120,6 +2132,15 @@ def validate_profile_input(value: str) -> bool:
)
print("=" * table_width + "\n")

# Display API statistics
total_api_calls = _api_stats["control_d_api_calls"] + _api_stats["blocklist_fetches"]
if total_api_calls > 0:
print(f"{Colors.BOLD}API Statistics:{Colors.ENDC}")
print(f" • Control D API calls: {_api_stats['control_d_api_calls']:>6,}")
print(f" • Blocklist fetches: {_api_stats['blocklist_fetches']:>6,}")
print(f" • Total API requests: {total_api_calls:>6,}")
Comment on lines +2149 to +2151
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alignment in the API statistics display is inconsistent with the cache statistics below. The API stats use :>6, formatting (lines 2139-2141), while the cache stats use :>6, as well (lines 2147-2149), but the labels have different lengths causing misalignment.

Specifically:

  • "Control D API calls: " has 2 spaces before the number
  • "Blocklist fetches: " has 4 spaces before the number
  • "Total API requests: " has 3 spaces before the number
  • "Hits (in-memory): " has 5 spaces before the number

For visual consistency and readability, the spacing should be uniform. Consider aligning all labels to the same width, similar to how the cache statistics maintain consistent spacing.

Copilot uses AI. Check for mistakes.
print()

# Display cache statistics if any cache activity occurred
if _cache_stats["hits"] + _cache_stats["misses"] + _cache_stats["validations"] > 0:
print(f"{Colors.BOLD}Cache Statistics:{Colors.ENDC}")
Expand Down
111 changes: 111 additions & 0 deletions tests/test_api_tracking.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
"""Test API call tracking functionality"""
import unittest
from unittest.mock import patch, MagicMock
import sys
sys.path.insert(0, '.')


class TestAPITracking(unittest.TestCase):
"""Tests for API call tracking and statistics"""

def test_api_stats_initialized(self):
"""Test that _api_stats is properly initialized"""
import main
# Should have both counters
self.assertIn("control_d_api_calls", main._api_stats)
self.assertIn("blocklist_fetches", main._api_stats)
# Should start at 0
self.assertIsInstance(main._api_stats["control_d_api_calls"], int)
self.assertIsInstance(main._api_stats["blocklist_fetches"], int)

@patch('main.httpx.Client')
def test_api_get_increments_counter(self, mock_client_class):
"""Test that _api_get increments the API call counter"""
import main

# Reset counter
main._api_stats["control_d_api_calls"] = 0
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test manually resets _api_stats["control_d_api_calls"] = 0 but doesn't restore the original value after the test completes. This breaks test isolation - if the counter had a non-zero value before the test (from previous tests or module initialization), that state is lost and could affect subsequent tests.

Consider using setUp/tearDown methods to save and restore the original counter values, similar to how test_cache_optimization.py handles _cache.clear() in setUp (lines 24-25) and tearDown (lines 29-30). This ensures each test starts with a clean state and doesn't leak state changes to other tests.

Copilot uses AI. Check for mistakes.

# Mock the client and response
mock_client = MagicMock()
mock_response = MagicMock()
mock_response.raise_for_status = MagicMock()
mock_client.get.return_value = mock_response

# Call _api_get
main._api_get(mock_client, "http://test.url")

# Verify counter was incremented
self.assertEqual(main._api_stats["control_d_api_calls"], 1)

@patch('main.httpx.Client')
def test_api_post_increments_counter(self, mock_client_class):
"""Test that _api_post increments the API call counter"""
import main

# Reset counter
main._api_stats["control_d_api_calls"] = 0
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test manually resets _api_stats["control_d_api_calls"] = 0 but doesn't restore the original value after the test completes. This breaks test isolation and could affect subsequent tests.

Consider using setUp/tearDown methods to save and restore the original counter values to ensure proper test isolation.

Copilot uses AI. Check for mistakes.

# Mock the client and response
mock_client = MagicMock()
mock_response = MagicMock()
mock_response.raise_for_status = MagicMock()
mock_client.post.return_value = mock_response

# Call _api_post
main._api_post(mock_client, "http://test.url", {"key": "value"})

# Verify counter was incremented
self.assertEqual(main._api_stats["control_d_api_calls"], 1)

@patch('main.httpx.Client')
def test_api_delete_increments_counter(self, mock_client_class):
"""Test that _api_delete increments the API call counter"""
import main

# Reset counter
main._api_stats["control_d_api_calls"] = 0
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test manually resets _api_stats["control_d_api_calls"] = 0 but doesn't restore the original value after the test completes. This breaks test isolation and could affect subsequent tests.

Consider using setUp/tearDown methods to save and restore the original counter values to ensure proper test isolation.

Copilot uses AI. Check for mistakes.

# Mock the client and response
mock_client = MagicMock()
mock_response = MagicMock()
mock_response.raise_for_status = MagicMock()
mock_client.delete.return_value = mock_response

# Call _api_delete
main._api_delete(mock_client, "http://test.url")

# Verify counter was incremented
self.assertEqual(main._api_stats["control_d_api_calls"], 1)

@patch('main._gh')
def test_gh_get_increments_blocklist_counter(self, mock_gh_client):
"""Test that _gh_get increments the blocklist fetch counter"""
import main

# Reset counters
main._api_stats["blocklist_fetches"] = 0
main._cache.clear()
Comment on lines +87 to +88
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test manually resets _api_stats["blocklist_fetches"] = 0 and calls main._cache.clear() but doesn't restore the original values after the test completes. This breaks test isolation and could affect subsequent tests.

Consider using setUp/tearDown methods to save and restore the original counter and cache values to ensure proper test isolation.

Copilot uses AI. Check for mistakes.

# Mock the streaming response
mock_response = MagicMock()
mock_response.status_code = 200
mock_response.headers = {}
mock_response.iter_bytes.return_value = [b'{"test": "data"}']
mock_response.__enter__ = MagicMock(return_value=mock_response)
mock_response.__exit__ = MagicMock(return_value=False)

mock_gh_client.stream.return_value = mock_response

# Call _gh_get
try:
result = main._gh_get("http://test.blocklist.url")
except Exception:
pass # May fail on validation, we just care about the counter

Check notice on line 104 in tests/test_api_tracking.py

View check run for this annotation

codefactor.io / CodeFactor

tests/test_api_tracking.py#L103-L104

Try, Except, Pass detected. (B110)
Comment on lines +103 to +104

Check notice

Code scanning / Bandit

Try, Except, Pass detected. Note test

Try, Except, Pass detected.

# Verify blocklist counter was incremented
self.assertGreaterEqual(main._api_stats["blocklist_fetches"], 1)


if __name__ == '__main__':
unittest.main()
Loading