Skip to content

RFC #3: Improve API Realism for Tau3 #130

@victorb-sierra

Description

@victorb-sierra

Problem
Current APIs in Tau2 tasks may not reflect real-world usage or complexity, limiting the benchmark’s usefulness for evaluating agents in realistic scenarios.

Proposal

  • P0: Update APIs to be more realistic and reflective of tools used in production.
  • P1: Introduce failure modes or constraints that agents must handle.

Discussion Points

  • Which APIs should be prioritized for realism improvements?
  • How should failure modes be designed to balance realism with testability?
  • Any compatibility concerns with existing tasks or evaluations?
  • Is it easy to agree on what "realistic" APIs are?

Next Steps (after consensus)

  • Break proposed changes into actionable Feature/Task issues.
  • Assign issues to the v3.0 milestone and v3 branch for PRs.

Metadata

Metadata

Labels

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions