fix: await processFn to prevent buffer pool race condition#8877
fix: await processFn to prevent buffer pool race condition#8877nflaig merged 1 commit intoChainSafe:unstablefrom
Conversation
The `using` keyword releases the buffer back to the pool when the block exits. Since processFn is async (returns a Promise), the buffer was being released before the DB write completed. If another serialization (checkpoint state or archive state) happened before the write finished, it would get the same buffer and call fill(0), corrupting the in-flight write. This could cause 'First offset must equal to fixedEnd' errors on restart when the corrupted state is read from the database.
Summary of ChangesHello @lodekeeper, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves a significant data integrity issue by correcting a race condition during state serialization. By ensuring that buffers are not returned to the pool until all asynchronous write operations are complete, it prevents potential state corruption and system errors, enhancing the stability and reliability of the application. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request effectively resolves a critical race condition within the serializeState function. By adding await before processFn(stateBytes), the buffer is now guaranteed to not be released back to the pool until the asynchronous operation completes. This directly prevents the state corruption and First offset must equal to fixedEnd errors described in the motivation. The added comment provides excellent clarity on the purpose of this crucial change, significantly improving code maintainability and understanding for future developers.
|
going to merge this so that we can cut another rc |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## unstable #8877 +/- ##
=========================================
Coverage 52.34% 52.34%
=========================================
Files 848 848
Lines 63488 63487 -1
Branches 4704 4704
=========================================
Hits 33230 33230
+ Misses 30189 30188 -1
Partials 69 69 🚀 New features to boost your workflow:
|
Motivation
Fixes a race condition that can cause state corruption and
First offset must equal to fixedEnderrors on restart.See discussion: https://discord.com/channels/593655374469660673/1469368525180113078
Description
The
usingkeyword inserializeState.tsreleases the buffer back to the pool when the block exits. SinceprocessFnis async (returns a Promise), the buffer was being released before the DB write completed.If another serialization (checkpoint state or archive state) happened before the write finished, it would:
fill(0)on it (per BufferPool.alloc behavior)This could cause
First offset must equal to fixedEnd 0 != <large number>errors on restart when the corrupted state is read.Fix
Add
awaitbeforeprocessFn(stateBytes)to ensure the buffer is not released until the async operation completes.AI Disclosure: This PR was authored with AI assistance (Lodekeeper/Claude).