Skip to content

admission: implement 1ms disk bandwidth error correction#163347

Open
angeladietz wants to merge 5 commits intocockroachdb:masterfrom
angeladietz:store-ac-error-correction
Open

admission: implement 1ms disk bandwidth error correction#163347
angeladietz wants to merge 5 commits intocockroachdb:masterfrom
angeladietz:store-ac-error-correction

Conversation

@angeladietz
Copy link
Contributor

@angeladietz angeladietz commented Feb 11, 2026

Will rebase once #163342 is merged, this pr is for the 2nd two commits.


admission: implement 1ms disk bandwidth error correction

Enable disk bandwidth error correction on every tick (1ms when loaded)
rather than once per second. This improves responsiveness of token
adjustment to actual disk I/O.

Key changes:

  • Change errorAdjustmentInterval from 1 to 0.001 so shouldAdjustForError
    returns true on every tick
  • Add initialized flag to diskTokensError to skip the first call, which
    has no prior measurement to compute a delta from
  • Change adjustDiskTokenError to accept DiskStats directly and use
    GetDiskStats for instantaneous collection rather than GetPebbleMetrics
  • Log potential overadmission when disk write tokens go negative

Epic: CRDB-47073
Informs: #162214
Release note: None


admission: log extended durations of unchanged disk stats

Add observability for cases where disk stats don't change across
consecutive 1ms error correction ticks. This helps assess whether the
1ms tick rate is effective given the kernel's disk stat update
frequency (typically 4ms on Linux).

Introduce staleStatTracker which:

  • Counts consecutive ticks where BytesRead and BytesWritten are unchanged
  • Logs (rate-limited) when count exceeds thresholds of 20, 50, or 100
  • Tracks cumulative counts of runs exceeding each threshold

Epic: CRDB-47073
Informs: #162214
Release note: None

@trunk-io
Copy link
Contributor

trunk-io bot commented Feb 11, 2026

Merging to master in this repository is managed by Trunk.

  • To merge this pull request, check the box to the left or comment /trunk merge below.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@angeladietz
Copy link
Contributor Author

@sumeerbhola My last commit admission: log extended durations of unchanged disk stats is to emulate the logging you added in your prototype here, but i think we could get visibility into the same info by adding a histogram. Do you have a preference between my approach and using a histogram? I personally thought a histogram might be unnecessary since its more so for us to get a sense of how useful 1ms error correction is and less so an indicator for cluster health.

@angeladietz angeladietz marked this pull request as ready for review February 12, 2026 18:44
@angeladietz angeladietz requested review from a team as code owners February 12, 2026 18:44
@angeladietz angeladietz requested review from sumeerbhola and removed request for a team February 12, 2026 19:09
Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding that logging, I was just playing around to get a sense whether these stats were updated frequently enough to reflect changes. The results indicated that they are. I don't think we need any observability in this regard, but I'll take a look at what you added.

@sumeerbhola made 1 comment.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained.

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some simplification etc. comments

@sumeerbhola partially reviewed 3 files and made 4 comments.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @angeladietz).


pkg/util/admission/granter.go line 648 at r2 (raw file):

		cumUnaccountedError := *cumError - *accountedError
		if cumUnaccountedError != 0 {
			*accountedError += cumUnaccountedError

Can we get rid of accountedForError inside kvStoreTokenGranter now that we are accounting for all the error.


pkg/util/admission/granter.go line 856 at r2 (raw file):

		writeTokensPerSec := sg.mu.diskTokensAvailable.writeByteTokens / adjustmentInterval
		if writeTokensPerSec < 0 {
			log.Dev.Infof(context.Background(), "Potential overadmission=%s/s", humanizeutil.IBytes(-writeTokensPerSec))

This was fine for the prototype, but I think we need to plumb it back to the ioLoadListener, so that it is recorded as part of the log statement it outputs every 15s. In case you haven't seen it in escalations, an example is here https://cockroachlabs.atlassian.net/wiki/spaces/CKB/pages/3561455699/Playbook+v24.1+Admission+Control+Overload+Escalations+Dashboard#Logs.


pkg/util/admission/io_load_listener.go line 440 at r2 (raw file):

// corrections. Set to 0.001 (1ms) to enable error correction on every tick when
// loaded.
const errorAdjustmentInterval = 0.001

We should get rid of the shouldAdjustForError method and anything it needed, including this const. We are now deliberately adjusting for the error every time we add to the token bucket.

Enable disk bandwidth error correction on every tick (1ms when loaded)
rather than once per second. This improves responsiveness of token
adjustment to actual disk I/O.

Key changes:
- Change errorAdjustmentInterval from 1 to 0.001 so shouldAdjustForError
  returns true on every tick
- Add initialized flag to diskTokensError to skip the first call, which
  has no prior measurement to compute a delta from
- Change adjustDiskTokenError to accept DiskStats directly and use
  GetDiskStats for instantaneous collection rather than GetPebbleMetrics
- Log potential overadmission when disk write tokens go negative

Epic: CRDB-47073
Informs: cockroachdb#162214
Release note: None
Add observability for cases where disk stats don't change across
consecutive 1ms error correction ticks. This helps assess whether the
1ms tick rate is effective given the kernel's disk stat update
frequency (typically 4ms on Linux).

Introduce staleStatTracker which:
- Counts consecutive ticks where BytesRead and BytesWritten are unchanged
- Logs (rate-limited) when count exceeds thresholds of 20, 50, or 100
- Tracks cumulative counts of runs exceeding each threshold

Epic: CRDB-47073
Informs: cockroachdb#162214
Release note: None
This function was previously used to ensure error correction only
occured at a certain interval, but this logic was recently adjusted so
we now error correct on every tick. Thus, this function isn't needed
anymore.

Epic: CRDB-47073
Informs: cockroachdb#162214
Release note: None
@angeladietz angeladietz force-pushed the store-ac-error-correction branch from 4ef2fc8 to 4751fea Compare February 16, 2026 20:15
@github-actions
Copy link

Potential Bug(s) Detected

The three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation.

Next Steps:
Please review the detailed findings in the workflow run.

Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary.

After you review the findings, please tag the issue as follows:

  • If the detected issue is real or was helpful in any way, please tag the issue with O-AI-Review-Real-Issue-Found
  • If the detected issue was not helpful in any way, please tag the issue with O-AI-Review-Not-Helpful

@github-actions github-actions bot added the o-AI-Review-Potential-Issue-Detected AI reviewer found potential issue. Never assign manually—auto-applied by GH action only. label Feb 16, 2026
Previously, error correction used accountedForError to track what
portion of cumError had already been applied to available tokens, and
would only deduct (cumError - accountedForError)/15 on each error tick.
This dampening was needed because at coarser error correction intervals,
applying the full error immediately caused large token oscillations and
under-utilization (cockroachdb#160964).

With the recent move to 1ms error correction ticks, the per-tick error
is small enough to apply directly without dampening. This removes
accountedForError and simplifies errorAdjFunc to subtract the interval
error immediately.

Epic: CRDB-47073
Informs: cockroachdb#162214
Release note: None
Previously, potential disk overadmission was logged independently in
the kvStoreTokenGranter on every interval boundary, separate from the
main "IO overload" log line that oncallers use during escalations. Now,
the overadmission signal is plumbed through to the diskBandwidthLimiter
and included in its SafeFormat output, so it appears in the existing
15s log line alongside the other disk bandwidth stats.

This is done by stashing the disk write token balance (before the
max(0) reset) in addAvailableTokens on the last tick, returning it via
getDiskTokensUsedAndReset, and passing it through to
computeElasticTokens where it is stored in diskBandwidthLimiterState.
When the balance is negative, SafeFormat appends the overadmission rate.

Epic: CRDB-47073
Informs: cockroachdb#162214
Release note: None
@angeladietz angeladietz force-pushed the store-ac-error-correction branch from 4751fea to be9607b Compare February 16, 2026 21:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

o-AI-Review-Potential-Issue-Detected AI reviewer found potential issue. Never assign manually—auto-applied by GH action only.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants