Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
Summary of ChangesHello @jaxxjj, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves critical race conditions within the parallel agent system, specifically impacting scenarios where agents interact with tools. The previous implementation allowed concurrent goroutines to access shared session state and global telemetry variables without proper synchronization, leading to data inconsistencies and intermittent failures. The solution introduces robust synchronization mechanisms, including read-write mutexes for session state and telemetry, and a backpressure system utilizing acknowledgment channels for parallel agents. This ensures that all events are fully processed and shared state is consistently updated before sub-agents proceed, significantly enhancing the stability and correctness of parallel agent execution. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request effectively addresses a critical race condition in parallel agents by introducing an acknowledgment channel (ackChan) for backpressure, ensuring that sub-agents wait for event processing to complete before proceeding. The changes are well-structured and include a targeted test case that validates the fix. Additionally, mutexes have been correctly added to MutableSession and the telemetry package to protect shared state from concurrent access, resolving other potential data races. My review includes a couple of suggestions for simplification and improved error visibility.
Adds backpressure mechanism to prevent sub-agents from reading stale session state. Uses acknowledgment channels to ensure event processing completes before the next iteration continues. Co-Authored-By: jaxxjj <yc5082@nyu.edu>
Fix race condition in parallel agents with tools
Problem
Parallel agents using tools exhibit a data race: concurrent goroutines access shared session state without proper synchronization. This causes intermittent failures where agents build LLM requests with incomplete session history, missing recently executed tool responses.
Root Cause
Architecture
Parallel agent spawns N goroutines for N sub-agents, each executing:
Runner goroutine processes these events:
Race Window
The race occurs in this sequence:
Between steps 3-5, agent reads session while runner is still writing. Writes in one goroutine are not guaranteed visible to reads in another without explicit synchronization.
Fix
Implement backpressure control using acknowledgment channels. Agent goroutines wait for explicit confirmation that event processing is complete before proceeding to next iteration.
Changes
1. Session State Protection
File:
internal/sessioninternal/mutablesession.goIssue:
storedSessionfield accessed concurrently without synchronization.Fix: Added
sync.RWMutexto protect allstoredSessionfield accesses2. Backpressure Control in Parallel Agent
File:
agent/workflowagents/parallelagent/agent.goIssue: Sub-agents continue iteration before runner completes event processing.
Fix: Implemented acknowledgment channel protocol:
ackChan chan struct{}toresultstruct<-ackChanafter sending eventyield()returnsyield()return implies runner has appended event to session3. OpenTelemetry Tracer Protection
File:
internal/telemetry/telemetry.goIssue:
localTracerglobal variable accessed concurrently without synchronization.Fix: Added
sync.RWMutexto protect tracer accessTesting
Race Detector Results
Before fix:
race conditions in session and otel
After fix:
$ go test -race ./agent/workflowagents/parallelagent ok google.golang.org/adk/agent/workflowagents/parallelagent 3.264sAlignment with Python ADK
Python ADK implementation uses identical synchronization pattern:
Python (
asyncio.Event):Go (
chan struct{}):