feat(a2a): Add A2A protocol support for remote agent evaluation#143
Open
wuTims wants to merge 4 commits intosierra-research:mainfrom
Open
feat(a2a): Add A2A protocol support for remote agent evaluation#143wuTims wants to merge 4 commits intosierra-research:mainfrom
wuTims wants to merge 4 commits intosierra-research:mainfrom
Conversation
- Add A2AClient for HTTP communication with A2A agents - Add message translation between tau2 and A2A formats - Add protocol metrics collection for latency/token tracking - Add custom exceptions for A2A error handling - Support markdown code block extraction for tool calls - Include system context injection on first message
- Implement A2AAgent extending LocalAgent interface - Add async/sync bridge for HTTP operations - Support context persistence across conversation turns - Add protocol metrics export methods - Include from_cli_args factory for CLI integration
- Add --agent-a2a-endpoint, --agent-a2a-auth-token, --agent-a2a-timeout flags - Register a2a_agent in agent registry - Add A2AAgent construction in run_task() - Fix is_tool_call() to check for non-empty tool_calls list - Add httpx and pytest-asyncio dependencies
- Add mock-based A2A client tests (agent, discovery, translation) - Add protocol metrics and performance tests - Add backward compatibility tests for CLI and LLMAgent - Add regression tests for agent interface stability
Collaborator
|
Thank you for your PR. Added as a potential enhancement to the tau3 milestone. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces Agent-to-Agent (A2A) protocol support to tau2-bench, enabling evaluation of remote agents via the A2A protocol. This allows benchmarking of agents deployed as services without requiring local code integration.
Key additions:
A2AAgentimplementation for remote agent evaluation--agent-a2a-endpointflagCloses #111
Note: The scope of this initial PR was reduced by removing the tau2-agent implementation. Since there is an existing implementation of "agentified-tau-bench", I thought that adding an a2a-agent in the CLI would be more impactful. The forked tau2-agent implementation using ADK can be added in experiments/ if needed. The full implementation is available at my fork: https://github.com/wuTims/tau2-bench-agent
Design Decisions
1. Protocol Translation Layer
The A2A protocol uses a different message format than tau2's internal representation. Rather than modifying core tau2 data models, we implemented a dedicated translation layer (
src/tau2/a2a/translation.py) that:UserMessage/AssistantMessageto A2AMessageSendParams2. Stateless Agent Design
A2AAgentmaintains conversation state via A2A'scontext_idmechanism:A2AAgentState(immutable, returned with each response)3. Tool Description Injection
Since A2A agents don't receive tools via function calling, we inject tool descriptions as structured text in the system context:
4. Backward Compatibility
All existing functionality is preserved:
llm_agent--agent a2a_agentArchitecture
New Files
src/tau2/a2a/__init__.pysrc/tau2/a2a/client.pysrc/tau2/a2a/models.pysrc/tau2/a2a/translation.pysrc/tau2/a2a/exceptions.pysrc/tau2/a2a/metrics.pysrc/tau2/agent/a2a_agent.pyExample Usage
Basic A2A Agent Evaluation
With Authentication
Full Evaluation Script
Test Coverage
test_a2a_client/test_a2a_agent.pytest_a2a_client/test_agent_discovery.pytest_a2a_client/test_debug_logging.pytest_a2a_client/test_message_translation.pytest_a2a_client/test_metrics.pytest_a2a_client/test_metrics_export.pytest_a2a_client/test_performance.pytest_backward_compatibility/test_cli.pytest_backward_compatibility/test_llm_agent.pytest_backward_compatibility/test_regression.pyRunning Tests
Verified Evaluation
Successfully tested against a deployed A2A agent: