allocator: coordinate graceful exit of signaled members by williamhbaker · Pull Request #457 · gazette/core

williamhbaker · 2026-02-04T22:13:04Z

Allow scenarios like unattended upgrades where many members are signaled to exit at once. Members now mark themselves as exiting rather than zeroing their item limit, and the allocator gradually sheds their capacity from available excess slots, oldest first, capped so that enough members remain to satisfy replication.

A two step deployment will be needed for this, since allocators running the old code won't know about the "Exiting" flag or how to shed load from exiting brokers. Initially members running the new code will need to continue to zero their item limit when signaled to exit, and then a second deploy can remove that zero'ing to achieve the desired load shedding.

The first commit of this series (94104c1) is the as-new code; the second (258db2d) includes the backward compatibility adjustments. Once this PR is merged and fully deployed everywhere, a follow-up PR can revert that second commit which can then be deployed to finalize the change.

The benchmarks for the Allocator in allocator/benchmark_test.go were ran before and after this change and produced identical results. Also added a scenario to simulate scaling down first, followed by replacement: This is not a scenario that the previous version of code can model with its zero-limiting instead of partial "shed" capacities, so there's not a before & after comparison; but it may be informative to note that the down-then-up benchmark shows 10-20% more churn than the up-then-down case.

Manual tests performed:

On a group of gazette brokers running only the as-new code (first commit), signal all of them to exit at the same time:
- Brokers exit up to the point where the remaining brokers are at full capacity
- Or, until minimum replication constraints require additional brokers to run
- As restarted brokers come back online, assignments are moved to them and the remaining signaled brokers are able to exit
On a group of gazette brokers running only the as-new code, signal half of them to exit at the same time, with the cluster <50% loaded. All the signaled brokers exit immediately, and their assignments are transferred to the other half
On a mixed cluster running the code from this PR with the backward compatibility changes included (second commit) on half the brokers, and the other half of the brokers running our current production image:
- Signaling all of them (or enough to exceed the remainders capacity) at the same time results in a deadlock, as it does now
- Otherwise, the brokers can be stopped in any valid order that would work presently. Stop the ones on the new code first, or stop the ones on the old code first.

Allow scenarios like unattended upgrades where many members are signaled to exit at once. Members now mark themselves as exiting rather than zeroing their item limit, and the allocator gradually sheds their capacity from available excess slots, oldest first, capped so that enough members remain to satisfy replication. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When deploying to existing clusters, old allocators don't understand the Exiting field and need to see a zeroed limit to drain items from exiting members. Revert this commit once the cluster is fully upgraded. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

williamhbaker requested a review from jgraettinger February 4, 2026 22:36

williamhbaker force-pushed the wb/exit-strategy branch from afa2aaa to ba91114 Compare February 6, 2026 17:44

williamhbaker changed the title ~~wip: new exit strategy for signaled members~~ @williamhbaker @claude allocator: coordinate graceful exit of signaled members Feb 6, 2026

williamhbaker changed the title ~~@williamhbaker @claude allocator: coordinate graceful exit of signaled members~~ allocator: coordinate graceful exit of signaled members Feb 6, 2026

williamhbaker force-pushed the wb/exit-strategy branch from ba91114 to d4cb4cc Compare February 6, 2026 21:23

williamhbaker marked this pull request as ready for review February 6, 2026 21:37

williamhbaker force-pushed the wb/exit-strategy branch 2 times, most recently from 6cbd88b to 258db2d Compare February 9, 2026 21:09

williamhbaker and others added 2 commits February 9, 2026 16:42

williamhbaker force-pushed the wb/exit-strategy branch from 258db2d to 42a4400 Compare February 9, 2026 21:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

allocator: coordinate graceful exit of signaled members#457

allocator: coordinate graceful exit of signaled members#457
williamhbaker wants to merge 2 commits intomasterfrom
wb/exit-strategy

williamhbaker commented Feb 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

williamhbaker commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

williamhbaker commented Feb 4, 2026 •

edited

Loading