admission: implement 1ms disk bandwidth error correction#163347
admission: implement 1ms disk bandwidth error correction#163347angeladietz wants to merge 5 commits intocockroachdb:masterfrom
Conversation
|
Merging to
|
|
@sumeerbhola My last commit admission: log extended durations of unchanged disk stats is to emulate the logging you added in your prototype here, but i think we could get visibility into the same info by adding a histogram. Do you have a preference between my approach and using a histogram? I personally thought a histogram might be unnecessary since its more so for us to get a sense of how useful 1ms error correction is and less so an indicator for cluster health. |
sumeerbhola
left a comment
There was a problem hiding this comment.
Regarding that logging, I was just playing around to get a sense whether these stats were updated frequently enough to reflect changes. The results indicated that they are. I don't think we need any observability in this regard, but I'll take a look at what you added.
@sumeerbhola made 1 comment.
Reviewable status:complete! 0 of 0 LGTMs obtained.
sumeerbhola
left a comment
There was a problem hiding this comment.
some simplification etc. comments
@sumeerbhola partially reviewed 3 files and made 4 comments.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @angeladietz).
pkg/util/admission/granter.go line 648 at r2 (raw file):
cumUnaccountedError := *cumError - *accountedError if cumUnaccountedError != 0 { *accountedError += cumUnaccountedError
Can we get rid of accountedForError inside kvStoreTokenGranter now that we are accounting for all the error.
pkg/util/admission/granter.go line 856 at r2 (raw file):
writeTokensPerSec := sg.mu.diskTokensAvailable.writeByteTokens / adjustmentInterval if writeTokensPerSec < 0 { log.Dev.Infof(context.Background(), "Potential overadmission=%s/s", humanizeutil.IBytes(-writeTokensPerSec))
This was fine for the prototype, but I think we need to plumb it back to the ioLoadListener, so that it is recorded as part of the log statement it outputs every 15s. In case you haven't seen it in escalations, an example is here https://cockroachlabs.atlassian.net/wiki/spaces/CKB/pages/3561455699/Playbook+v24.1+Admission+Control+Overload+Escalations+Dashboard#Logs.
pkg/util/admission/io_load_listener.go line 440 at r2 (raw file):
// corrections. Set to 0.001 (1ms) to enable error correction on every tick when // loaded. const errorAdjustmentInterval = 0.001
We should get rid of the shouldAdjustForError method and anything it needed, including this const. We are now deliberately adjusting for the error every time we add to the token bucket.
Enable disk bandwidth error correction on every tick (1ms when loaded) rather than once per second. This improves responsiveness of token adjustment to actual disk I/O. Key changes: - Change errorAdjustmentInterval from 1 to 0.001 so shouldAdjustForError returns true on every tick - Add initialized flag to diskTokensError to skip the first call, which has no prior measurement to compute a delta from - Change adjustDiskTokenError to accept DiskStats directly and use GetDiskStats for instantaneous collection rather than GetPebbleMetrics - Log potential overadmission when disk write tokens go negative Epic: CRDB-47073 Informs: cockroachdb#162214 Release note: None
Add observability for cases where disk stats don't change across consecutive 1ms error correction ticks. This helps assess whether the 1ms tick rate is effective given the kernel's disk stat update frequency (typically 4ms on Linux). Introduce staleStatTracker which: - Counts consecutive ticks where BytesRead and BytesWritten are unchanged - Logs (rate-limited) when count exceeds thresholds of 20, 50, or 100 - Tracks cumulative counts of runs exceeding each threshold Epic: CRDB-47073 Informs: cockroachdb#162214 Release note: None
This function was previously used to ensure error correction only occured at a certain interval, but this logic was recently adjusted so we now error correct on every tick. Thus, this function isn't needed anymore. Epic: CRDB-47073 Informs: cockroachdb#162214 Release note: None
4ef2fc8 to
4751fea
Compare
Potential Bug(s) DetectedThe three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation. Next Steps: Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary. After you review the findings, please tag the issue as follows:
|
Previously, error correction used accountedForError to track what portion of cumError had already been applied to available tokens, and would only deduct (cumError - accountedForError)/15 on each error tick. This dampening was needed because at coarser error correction intervals, applying the full error immediately caused large token oscillations and under-utilization (cockroachdb#160964). With the recent move to 1ms error correction ticks, the per-tick error is small enough to apply directly without dampening. This removes accountedForError and simplifies errorAdjFunc to subtract the interval error immediately. Epic: CRDB-47073 Informs: cockroachdb#162214 Release note: None
Previously, potential disk overadmission was logged independently in the kvStoreTokenGranter on every interval boundary, separate from the main "IO overload" log line that oncallers use during escalations. Now, the overadmission signal is plumbed through to the diskBandwidthLimiter and included in its SafeFormat output, so it appears in the existing 15s log line alongside the other disk bandwidth stats. This is done by stashing the disk write token balance (before the max(0) reset) in addAvailableTokens on the last tick, returning it via getDiskTokensUsedAndReset, and passing it through to computeElasticTokens where it is stored in diskBandwidthLimiterState. When the balance is negative, SafeFormat appends the overadmission rate. Epic: CRDB-47073 Informs: cockroachdb#162214 Release note: None
4751fea to
be9607b
Compare
Will rebase once #163342 is merged, this pr is for the 2nd two commits.
admission: implement 1ms disk bandwidth error correction
Enable disk bandwidth error correction on every tick (1ms when loaded)
rather than once per second. This improves responsiveness of token
adjustment to actual disk I/O.
Key changes:
returns true on every tick
has no prior measurement to compute a delta from
GetDiskStats for instantaneous collection rather than GetPebbleMetrics
Epic: CRDB-47073
Informs: #162214
Release note: None
admission: log extended durations of unchanged disk stats
Add observability for cases where disk stats don't change across
consecutive 1ms error correction ticks. This helps assess whether the
1ms tick rate is effective given the kernel's disk stat update
frequency (typically 4ms on Linux).
Introduce staleStatTracker which:
Epic: CRDB-47073
Informs: #162214
Release note: None