refactor: add replication exponential histograms#7667
refactor: add replication exponential histograms#7667zawadzkidiana wants to merge 9 commits intocadence-workflow:masterfrom
Conversation
Keep the PR focused on replication histogram code by undoing local config and docker file changes from the original commit.
Pass a clock.TimeSource into TaskStore and use it for latency measurement and log throttling to keep timing consistent and testable. Co-authored-by: Cursor <cursoragent@cursor.com>
This reverts commit c65fb81.
Inject a clock.TimeSource into TaskStore to measure histogram latency and log throttling in a testable, consistent way.
df3202e to
9659bb6
Compare
| @@ -0,0 +1,193 @@ | |||
| package replication | |||
There was a problem hiding this comment.
I don't think we need to test the histograms. We can remove this file
There was a problem hiding this comment.
We should do either one by one or include all of the changes in one PR. Right now we are doing a change in the task_ack_manager and one in the task_store. I'm fine with either going one by one, file by file or all of them, but right now we are doing two different files and not all of them
There was a problem hiding this comment.
Also, for the cache size I think we should go with gauge and not histogram
Record replication cache size as a gauge rather than a timer when updating cache size metrics.
Drop the exponential histogram test file per review feedback.
Emit an exponential histogram alongside the replication tasks applied latency timer for consistent latency coverage.
🔍 CI failure analysis for f3d5207: Flaky test failure in TestActivityHeartbeatDetailsDuringRetry, unrelated to PR changes which only modify replication metrics instrumentation.IssueThe integration test
The test failed twice during the same CI run (at 19:30:56 and 19:44:49), indicating timing-sensitive behavior. Root CauseThis is a flaky test unrelated to the PR changes. The PR only modifies replication metrics instrumentation:
The failing test validates activity heartbeat details during retries, which is completely separate from replication metrics. DetailsEvidence this is a flaky test:
The test expects a heartbeat details count of 2 but receives 1, characteristic of race conditions or timing sensitivities in integration tests running with SQLite. Code Review ✅ Approved 1 resolved / 1 findingsClean implementation of exponential histogram metrics for replication latency tracking. Follows established dual-emit migration pattern, adds proper injectable time sources for testability, and includes a sensible bug fix changing CacheSize from Timer to Gauge. ✅ 1 resolved✅ Quality: Inconsistent time source usage in TaskStore
Tip Comment OptionsAuto-apply is off → Gitar will not commit updates to this branch. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
Summary
Test plan