Daily Perf Improver - Add jitter to retry backoff for improved API reliability by github-actions[bot] · Pull Request #295 · abhimehro/ctrld-sync

github-actions · 2026-02-17T03:54:56Z

Goal and Rationale

Performance target: Improve retry reliability under API failures by preventing thundering herd syndrome

Why it matters: When multiple clients experience simultaneous failures (e.g., API outage), synchronized retries create load spikes that can overwhelm a recovering server. This is a critical concern for rate-limited APIs like Control D.

Maintainer priority: Directly addresses feedback from discussion #219: "exponential backoff with jitter" and "API rate limits are non-negotiable"

Approach

Implemented ±50% jitter on retry delays using industry-standard formula:

wait_time = (base_delay * 2^attempt) * (0.5 + random())

This spreads retries across a time window while maintaining exponential backoff behavior.

Example: A 4-second base delay becomes a 2-6 second range, preventing simultaneous retries from 100 clients at exactly t=4s.

Implementation

Changes made:

Added random module import
Modified _retry_request() to apply jitter factor [0.5, 1.5] to exponential backoff
Updated log format to show actual jittered delay with 2 decimal precision
Created comprehensive test suite (7 test cases)
Added API retry strategy guide for future development

Code quality:

Minimal changes: 11 lines modified in core retry logic
Preserves all existing behavior: 4xx fail-fast, max retries, exponential growth
Backward compatible: no API changes

Impact Measurement

Synthetic Benchmark Results

Run python3 benchmark_retry_jitter.py to see the demonstration:

Without jitter (old):

All 100 clients retry at t=1s → server receives 100 simultaneous requests
All 100 clients retry at t=2s → server receives 100 simultaneous requests
Predictable load spikes during recovery

With jitter (new):

100 clients spread across t=0.5s to t=1.5s (1-second window)
Reduced peak concurrent load on server
Retries distributed over time, improving recovery success rate

Performance Overhead

Zero overhead on successful requests (jitter only applies to retry path)
Microseconds per retry (random.random() is ~1µs)
Negligible compared to network I/O (typical retry delay: 1-16 seconds)

Reliability Impact

Before:

Thundering herd during API outages
All clients retry simultaneously → cascading failures
Higher 429 rate limit errors during recovery

After:

Distributed retry timing
Reduced server load spikes
Better API recovery outcomes

Trade-offs

Complexity: Minimal - added one random multiplication
Maintainability: Improved - added comprehensive documentation and tests
Determinism: Retries are now randomized, but within predictable bounds

Validation

Test Coverage

Added tests/test_retry_jitter.py with 7 test cases:

✅ Jitter adds randomness (verify delays differ across runs)
✅ Jitter stays within bounds [0.5x, 1.5x base delay]
✅ Exponential backoff still increases despite jitter
✅ 4xx errors still fail fast (no retries)
✅ 429 rate limits retry with jitter
✅ Successful retry after transient failures
✅ Max retries limit respected

Testing approach:

# Run jitter-specific tests
pytest tests/test_retry_jitter.py -v

# Run full test suite to ensure no regressions
pytest tests/ -n auto -v

Reproducibility

Quick validation:

# Demonstrate jitter behavior visually
python3 benchmark_retry_jitter.py

# Expected output shows:
# - WITHOUT JITTER: deterministic delays (1s, 2s, 4s, 8s)
# - WITH JITTER: randomized delays within bounds (e.g., 1.26s, 2.67s, 2.52s, 4.58s)

Integration test:

# The existing sync workflow will automatically use jittered retries
# No configuration changes required
python3 main.py --dry-run

Future Work

Based on API retry strategy guide (.github/copilot/instructions/api-retry-strategy.md):

Rate limit header parsing: Read Retry-After from 429 responses
Circuit breaker: Stop retrying after consecutive failures
Per-endpoint strategies: Different backoff for read vs. write operations
Max backoff cap: Prevent indefinite delays on later retries

Files Changed

main.py: Added jitter to retry logic (11 lines changed)
tests/test_retry_jitter.py: 7 comprehensive test cases (new file)
.github/copilot/instructions/api-retry-strategy.md: Performance guide (new file)
benchmark_retry_jitter.py: Interactive demonstration tool (new file)

Addresses: Performance target from discussion #219
Risk level: Low (only affects error path, extensively tested)
Performance gain: Prevents thundering herd, improves API reliability under load

AI generated by Daily Perf Improver

Implements randomized retry delays (±50% jitter) to prevent thundering herd when multiple failed requests retry simultaneously. **Performance Impact:** - Prevents API server load spikes during retry storms - Distributes retry timing across 2-6s range instead of synchronized 4s - Reduces likelihood of cascading failures during API outages - Zero overhead on successful requests (only affects retry path) **Implementation:** - Added random module import - Modified _retry_request() to multiply base backoff by random factor [0.5, 1.5] - Updated log format to show actual jittered delay with 2 decimal places - Maintains existing behavior: 4xx fail-fast, exponential growth, max retries **Testing:** - Added 7 comprehensive test cases covering jitter bounds, exponential growth, error handling, and successful retries - Validates jitter stays within [0.5x, 1.5x] range - Confirms 4xx errors still fail fast without retries - Verifies 429 rate limits retry with jittered backoff Addresses maintainer feedback from discussion #219 requesting "exponential backoff with jitter" for improved retry reliability. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

trunk-io · 2026-02-17T03:55:00Z

Merging to main in this repository is managed by Trunk.

To merge this pull request, check the box to the left or comment /trunk merge below.

Copilot

Pull request overview

Adds exponential backoff jitter to the HTTP retry path to reduce synchronized retry spikes (“thundering herd”) during upstream outages/rate-limiting, plus supporting tests and developer guidance.

Changes:

Apply ±50% jitter factor to _retry_request() exponential backoff delays and log the jittered delay with 2-decimal precision.
Add a new jitter-focused unit test suite for retry behavior.
Add a retry-strategy guide and a benchmark/demo script.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 7 comments.

File	Description
`main.py`	Adds jittered exponential backoff to `_retry_request()` and updates retry log formatting.
`tests/test_retry_jitter.py`	New tests intended to validate jitter bounds, retry behavior, and max retry enforcement.
`.github/copilot/instructions/api-retry-strategy.md`	New internal guide documenting the retry strategy and recommended jitter approach.
`benchmark_retry_jitter.py`	New demo script illustrating the difference between deterministic backoff and jittered backoff.

benchmark_retry_jitter.py

Copilot · 2026-02-17T05:28:13Z

tests/test_retry_jitter.py

+        # Due to jitter, wait times should differ between runs
+        # (with high probability - could theoretically be equal but extremely unlikely)
+        assert wait_times_run1 != wait_times_run2, \
+            "Jitter should produce different wait times across runs"


test_jitter_adds_randomness_to_retry_delays is probabilistic and can be flaky (there’s a non-zero chance both retry sequences produce identical sleep values). To make this deterministic, patch random.random() with two different known sequences (or assert that time.sleep was called with values derived from patched jitter factors) instead of relying on natural RNG variance.

tests/test_retry_jitter.py

.github/copilot/instructions/api-retry-strategy.md

benchmark_retry_jitter.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

github-actions · 2026-02-17T05:44:03Z

👋 Development Partner is reviewing this PR. Will provide feedback shortly.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

github-actions · 2026-02-17T05:44:18Z

👋 Development Partner is reviewing this PR. Will provide feedback shortly.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

github-actions · 2026-02-17T05:44:31Z

👋 Development Partner is reviewing this PR. Will provide feedback shortly.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

github-actions · 2026-02-17T05:44:47Z

👋 Development Partner is reviewing this PR. Will provide feedback shortly.

benchmark_retry_jitter.py

+    delays = []
+    for attempt in range(max_retries - 1):
+        base_wait = base_delay * (2 ** attempt)
+        jitter_factor = 0.5 + random.random()  # [0.5, 1.5]


benchmark_retry_jitter.py

+    # Simulate retry distribution
+    retry_times = []
+    for _ in range(num_clients):
+        first_retry = (base_delay * (0.5 + random.random()))


main.py

+            # Jitter: multiply by random factor in range [0.5, 1.5] to spread retries
+            # This prevents multiple failed requests from retrying simultaneously
+            base_wait = delay * (2**attempt)
+            jitter_factor = 0.5 + random.random()  # Random value between 0.5 and 1.5


tests/test_retry_jitter.py

+            wait_times_run2 = [call.args[0] for call in mock_sleep.call_args_list]
+
+        # Both runs should have same number of retries (2 retries for 3 max_retries)
+        assert len(wait_times_run1) == 2


tests/test_retry_jitter.py

+
+        # Both runs should have same number of retries (2 retries for 3 max_retries)
+        assert len(wait_times_run1) == 2
+        assert len(wait_times_run2) == 2


tests/test_retry_jitter.py

+            response = main._retry_request(request_func, max_retries=5, delay=1)
+
+            # Should have made 3 requests total (2 failures + 1 success)
+            assert request_func.call_count == 3


tests/test_retry_jitter.py

+            assert request_func.call_count == 3
+
+            # Should have slept twice (after first two failures)
+            assert mock_sleep.call_count == 2


tests/test_retry_jitter.py

+            assert mock_sleep.call_count == 2
+
+            # Should return the successful response
+            assert response.status_code == 200


tests/test_retry_jitter.py

+                main._retry_request(request_func, max_retries=4, delay=1)
+
+            # Should attempt exactly max_retries times
+            assert request_func.call_count == 4


tests/test_retry_jitter.py

+            assert request_func.call_count == 4
+
+            # Should sleep max_retries-1 times (no sleep after final failure)
+            assert mock_sleep.call_count == 3


Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

github-actions · 2026-02-17T05:45:09Z

👋 Development Partner is reviewing this PR. Will provide feedback shortly.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

github-actions · 2026-02-17T05:45:26Z

👋 Development Partner is reviewing this PR. Will provide feedback shortly.

github-actions bot added automation performance labels Feb 17, 2026

abhimehro marked this pull request as ready for review February 17, 2026 05:24

Copilot AI review requested due to automatic review settings February 17, 2026 05:24

abhimehro self-assigned this Feb 17, 2026

Copilot started reviewing on behalf of abhimehro February 17, 2026 05:25 View session

abhimehro assigned Copilot, Claude and Codex Feb 17, 2026

Copilot AI reviewed Feb 17, 2026

View reviewed changes

Update benchmark_retry_jitter.py

56ed181

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

github-actions bot added the python label Feb 17, 2026

github-actions bot added the configuration label Feb 17, 2026

Update tests/test_retry_jitter.py

4576580

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update .github/copilot/instructions/api-retry-strategy.md

2d33756

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update tests/test_retry_jitter.py

ae3fce3

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

github-advanced-security bot found potential problems Feb 17, 2026

View reviewed changes

Update .github/copilot/instructions/api-retry-strategy.md

77f2c9b

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update benchmark_retry_jitter.py

deae490

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Conversation

github-actions bot commented Feb 17, 2026

Goal and Rationale

Approach

Implementation

Impact Measurement

Synthetic Benchmark Results

Performance Overhead

Reliability Impact

Trade-offs

Validation

Test Coverage

Reproducibility

Future Work

Files Changed

Uh oh!

trunk-io bot commented Feb 17, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Feb 17, 2026

Uh oh!

github-actions bot commented Feb 17, 2026

Uh oh!

github-actions bot commented Feb 17, 2026

Uh oh!

github-actions bot commented Feb 17, 2026

Uh oh!

Check notice

Check notice

Check notice

Check notice

Check notice

Check notice

Check notice

Check notice

Check notice

Check notice

github-actions bot commented Feb 17, 2026

Uh oh!

github-actions bot commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants