Daily Perf Improver - Add API call tracking and enhanced summary reporting#270
Daily Perf Improver - Add API call tracking and enhanced summary reporting#270github-actions[bot] wants to merge 4 commits intomainfrom
Conversation
- Track Control D API calls (GET, POST, DELETE operations) - Track blocklist fetches separately - Display API statistics in sync summary output - Add comprehensive performance monitoring guide - Add test coverage for API tracking functionality Addresses maintainer feedback requesting API call counts in summary. Non-intrusive implementation with minimal code changes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Merging to
|
|
👋 Development Partner is reviewing this PR. Will provide feedback shortly. |
There was a problem hiding this comment.
Pull request overview
This PR implements API call tracking to provide visibility into sync performance by adding counters for Control D API calls and blocklist fetches. The implementation follows the existing pattern of global statistics dictionaries used in the codebase (_cache_stats) and displays the metrics in a new "API Statistics" section in the summary output.
Changes:
- Added
_api_statsglobal dictionary with counters forcontrol_d_api_callsandblocklist_fetches - Instrumented 5 functions to increment counters:
_api_get,_api_post,_api_delete,_api_post_form(Control D API), and_gh_get(blocklist fetches) - Added API Statistics display section in summary output before cache statistics
- Created comprehensive test suite with 6 test cases
- Added performance monitoring guide documentation
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 11 comments.
| File | Description |
|---|---|
| main.py | Added _api_stats global dictionary, instrumented API functions to increment counters, and added API Statistics display section in summary output |
| tests/test_api_tracking.py | New test file with 6 test cases covering counter initialization and increments for all instrumented functions |
| .github/copilot/instructions/performance-monitoring.md | New comprehensive guide documenting performance metrics, measurement strategies, bottlenecks, and best practices for the codebase |
| # to enable fast cold-start syncs via conditional HTTP requests (304 Not Modified) | ||
| _disk_cache: Dict[str, Dict[str, Any]] = {} # Loaded from disk on startup | ||
| _cache_stats = {"hits": 0, "misses": 0, "validations": 0, "errors": 0} | ||
| _api_stats = {"control_d_api_calls": 0, "blocklist_fetches": 0} |
There was a problem hiding this comment.
The _api_stats dictionary is accessed concurrently from multiple threads without synchronization. Both _gh_get (called from warm_up_cache with ThreadPoolExecutor) and _api_get/_api_post/_api_delete (called from get_all_existing_rules with ThreadPoolExecutor max_workers=5, and from delete operations with DELETE_WORKERS=3) can increment these counters simultaneously. This creates a race condition where increments can be lost.
Dictionary operations like += are not atomic in Python - they involve read-modify-write cycles that can interleave, causing lost updates. For example, if two threads read the same value 5, both increment to 6, and both write 6, one increment is lost.
Consider using the existing _cache_lock (or creating a dedicated lock) to protect these counter updates, similar to how _cache_stats updates within _cache_lock protection (line 901, 931). Alternatively, use threading.Lock() or atomic counter classes from the threading module.
main.py
Outdated
|
|
||
| def _api_post_form(client: httpx.Client, url: str, data: Dict) -> httpx.Response: | ||
| global _api_stats | ||
| _api_stats["control_d_api_calls"] += 1 |
There was a problem hiding this comment.
The increment of _api_stats["control_d_api_calls"] is not protected by any lock. This function is called concurrently from multiple ThreadPoolExecutor contexts, creating race conditions that can lead to lost counter updates.
Consider protecting this update with a lock to ensure thread-safe counter increments.
| import main | ||
|
|
||
| # Reset counter | ||
| main._api_stats["control_d_api_calls"] = 0 |
There was a problem hiding this comment.
The test manually resets _api_stats["control_d_api_calls"] = 0 but doesn't restore the original value after the test completes. This breaks test isolation and could affect subsequent tests.
Consider using setUp/tearDown methods to save and restore the original counter values to ensure proper test isolation.
| print(f" • Control D API calls: {_api_stats['control_d_api_calls']:>6,}") | ||
| print(f" • Blocklist fetches: {_api_stats['blocklist_fetches']:>6,}") | ||
| print(f" • Total API requests: {total_api_calls:>6,}") |
There was a problem hiding this comment.
The alignment in the API statistics display is inconsistent with the cache statistics below. The API stats use :>6, formatting (lines 2139-2141), while the cache stats use :>6, as well (lines 2147-2149), but the labels have different lengths causing misalignment.
Specifically:
- "Control D API calls: " has 2 spaces before the number
- "Blocklist fetches: " has 4 spaces before the number
- "Total API requests: " has 3 spaces before the number
- "Hits (in-memory): " has 5 spaces before the number
For visual consistency and readability, the spacing should be uniform. Consider aligning all labels to the same width, similar to how the cache statistics maintain consistent spacing.
main.py
Outdated
| def _api_get(client: httpx.Client, url: str) -> httpx.Response: | ||
| global _api_stats | ||
| _api_stats["control_d_api_calls"] += 1 | ||
| return _retry_request(lambda: client.get(url)) | ||
|
|
||
|
|
||
| def _api_delete(client: httpx.Client, url: str) -> httpx.Response: | ||
| global _api_stats | ||
| _api_stats["control_d_api_calls"] += 1 | ||
| return _retry_request(lambda: client.delete(url)) | ||
|
|
||
|
|
||
| def _api_post(client: httpx.Client, url: str, data: Dict) -> httpx.Response: | ||
| global _api_stats | ||
| _api_stats["control_d_api_calls"] += 1 | ||
| return _retry_request(lambda: client.post(url, data=data)) | ||
|
|
||
|
|
||
| def _api_post_form(client: httpx.Client, url: str, data: Dict) -> httpx.Response: | ||
| global _api_stats | ||
| _api_stats["control_d_api_calls"] += 1 |
There was a problem hiding this comment.
The increment of _api_stats["control_d_api_calls"] is not protected by any lock. This function is called concurrently from multiple ThreadPoolExecutor contexts (e.g., get_all_existing_rules with max_workers=5 at line 1288, parallel deletions with DELETE_WORKERS=3 at line 1811). Without synchronization, concurrent increments can result in lost updates and inaccurate counter values.
Consider protecting this update with a lock to ensure thread-safe counter increments.
| def _api_get(client: httpx.Client, url: str) -> httpx.Response: | |
| global _api_stats | |
| _api_stats["control_d_api_calls"] += 1 | |
| return _retry_request(lambda: client.get(url)) | |
| def _api_delete(client: httpx.Client, url: str) -> httpx.Response: | |
| global _api_stats | |
| _api_stats["control_d_api_calls"] += 1 | |
| return _retry_request(lambda: client.delete(url)) | |
| def _api_post(client: httpx.Client, url: str, data: Dict) -> httpx.Response: | |
| global _api_stats | |
| _api_stats["control_d_api_calls"] += 1 | |
| return _retry_request(lambda: client.post(url, data=data)) | |
| def _api_post_form(client: httpx.Client, url: str, data: Dict) -> httpx.Response: | |
| global _api_stats | |
| _api_stats["control_d_api_calls"] += 1 | |
| # Lock to ensure thread-safe updates to _api_stats in multithreaded contexts | |
| _api_stats_lock = threading.Lock() | |
| def _api_get(client: httpx.Client, url: str) -> httpx.Response: | |
| global _api_stats | |
| # Protect increment to avoid lost updates when called from ThreadPoolExecutor | |
| with _api_stats_lock: | |
| _api_stats["control_d_api_calls"] += 1 | |
| return _retry_request(lambda: client.get(url)) | |
| def _api_delete(client: httpx.Client, url: str) -> httpx.Response: | |
| global _api_stats | |
| # Same lock used across all API helper methods for consistent synchronization | |
| with _api_stats_lock: | |
| _api_stats["control_d_api_calls"] += 1 | |
| return _retry_request(lambda: client.delete(url)) | |
| def _api_post(client: httpx.Client, url: str, data: Dict) -> httpx.Response: | |
| global _api_stats | |
| with _api_stats_lock: | |
| _api_stats["control_d_api_calls"] += 1 | |
| return _retry_request(lambda: client.post(url, data=data)) | |
| def _api_post_form(client: httpx.Client, url: str, data: Dict) -> httpx.Response: | |
| global _api_stats | |
| with _api_stats_lock: | |
| _api_stats["control_d_api_calls"] += 1 |
main.py
Outdated
| def _api_get(client: httpx.Client, url: str) -> httpx.Response: | ||
| global _api_stats | ||
| _api_stats["control_d_api_calls"] += 1 | ||
| return _retry_request(lambda: client.get(url)) | ||
|
|
||
|
|
||
| def _api_delete(client: httpx.Client, url: str) -> httpx.Response: | ||
| global _api_stats | ||
| _api_stats["control_d_api_calls"] += 1 | ||
| return _retry_request(lambda: client.delete(url)) | ||
|
|
||
|
|
||
| def _api_post(client: httpx.Client, url: str, data: Dict) -> httpx.Response: | ||
| global _api_stats | ||
| _api_stats["control_d_api_calls"] += 1 | ||
| return _retry_request(lambda: client.post(url, data=data)) | ||
|
|
||
|
|
||
| def _api_post_form(client: httpx.Client, url: str, data: Dict) -> httpx.Response: | ||
| global _api_stats | ||
| _api_stats["control_d_api_calls"] += 1 |
There was a problem hiding this comment.
The increment of _api_stats["control_d_api_calls"] is not protected by any lock. This function is called concurrently from multiple ThreadPoolExecutor contexts, creating race conditions that can lead to lost counter updates.
Consider protecting this update with a lock to ensure thread-safe counter increments.
| def _api_get(client: httpx.Client, url: str) -> httpx.Response: | |
| global _api_stats | |
| _api_stats["control_d_api_calls"] += 1 | |
| return _retry_request(lambda: client.get(url)) | |
| def _api_delete(client: httpx.Client, url: str) -> httpx.Response: | |
| global _api_stats | |
| _api_stats["control_d_api_calls"] += 1 | |
| return _retry_request(lambda: client.delete(url)) | |
| def _api_post(client: httpx.Client, url: str, data: Dict) -> httpx.Response: | |
| global _api_stats | |
| _api_stats["control_d_api_calls"] += 1 | |
| return _retry_request(lambda: client.post(url, data=data)) | |
| def _api_post_form(client: httpx.Client, url: str, data: Dict) -> httpx.Response: | |
| global _api_stats | |
| _api_stats["control_d_api_calls"] += 1 | |
| # Dedicated lock to ensure thread-safe updates of _api_stats counters. | |
| # This prevents lost increments when API helpers are used from multiple threads. | |
| _api_stats_lock = threading.Lock() | |
| def _api_get(client: httpx.Client, url: str) -> httpx.Response: | |
| global _api_stats | |
| # Protect counter increment with a lock to avoid race conditions. | |
| with _api_stats_lock: | |
| _api_stats["control_d_api_calls"] += 1 | |
| return _retry_request(lambda: client.get(url)) | |
| def _api_delete(client: httpx.Client, url: str) -> httpx.Response: | |
| global _api_stats | |
| # Protect counter increment with a lock to avoid race conditions. | |
| with _api_stats_lock: | |
| _api_stats["control_d_api_calls"] += 1 | |
| return _retry_request(lambda: client.delete(url)) | |
| def _api_post(client: httpx.Client, url: str, data: Dict) -> httpx.Response: | |
| global _api_stats | |
| # Protect counter increment with a lock to avoid race conditions. | |
| with _api_stats_lock: | |
| _api_stats["control_d_api_calls"] += 1 | |
| return _retry_request(lambda: client.post(url, data=data)) | |
| def _api_post_form(client: httpx.Client, url: str, data: Dict) -> httpx.Response: | |
| global _api_stats | |
| # Protect counter increment with a lock to avoid race conditions. | |
| with _api_stats_lock: | |
| _api_stats["control_d_api_calls"] += 1 |
| import main | ||
|
|
||
| # Reset counter | ||
| main._api_stats["control_d_api_calls"] = 0 |
There was a problem hiding this comment.
The test manually resets _api_stats["control_d_api_calls"] = 0 but doesn't restore the original value after the test completes. This breaks test isolation - if the counter had a non-zero value before the test (from previous tests or module initialization), that state is lost and could affect subsequent tests.
Consider using setUp/tearDown methods to save and restore the original counter values, similar to how test_cache_optimization.py handles _cache.clear() in setUp (lines 24-25) and tearDown (lines 29-30). This ensures each test starts with a clean state and doesn't leak state changes to other tests.
| import main | ||
|
|
||
| # Reset counter | ||
| main._api_stats["control_d_api_calls"] = 0 |
There was a problem hiding this comment.
The test manually resets _api_stats["control_d_api_calls"] = 0 but doesn't restore the original value after the test completes. This breaks test isolation and could affect subsequent tests.
Consider using setUp/tearDown methods to save and restore the original counter values to ensure proper test isolation.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
👋 Development Partner is reviewing this PR. Will provide feedback shortly. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
👋 Development Partner is reviewing this PR. Will provide feedback shortly. |
|
Hi @copilot, please open a new pull request to apply changes based on the comments in this thread. Thanks! |
|
@abhimehro I've opened a new pull request, #297, to work on those changes. Once the pull request is ready, I'll request review from you. |
Goal and Rationale
This PR implements API call tracking to provide visibility into sync performance, directly addressing maintainer feedback requesting "API calls made" in the summary output.
API call counts are critical performance metrics that help:
Approach
Implemented minimalist instrumentation at strategic API boundaries:
_api_get,_api_post,_api_delete,_api_post_form_gh_getbefore HTTP requestImpact Measurement
Code Changes (Minimal)
Performance Impact
Example Output
Trade-offs
Pros:
Cons:
_cache_stats)Validation
Testing Approach
test_api_tracking.pyTest Commands
Reproducibility
To measure API call tracking in action:
Expected result: Warm cache should show fewer blocklist fetches due to 304 responses.
Future Work
Identified during implementation:
Files Changed
New Files
.github/copilot/instructions/performance-monitoring.md: Comprehensive performance engineering guide (157 lines)tests/test_api_tracking.py: Test coverage for API tracking (111 lines)Modified Files
main.py:_api_statsglobal dictionaryNo configuration files, documentation, or CI workflows modified.
Security Considerations
What Happens Next
Once merged, subsequent syncs will display API statistics in the summary output, providing immediate feedback on:
This data enables future optimizations to be measured quantitatively.