RFC #3: Improve API Realism for Tau3

**Problem**
Current APIs in Tau2 tasks may not reflect real-world usage or complexity, limiting the benchmark’s usefulness for evaluating agents in realistic scenarios.

**Proposal**

* P0: Update APIs to be more realistic and reflective of tools used in production.
* P1: Introduce failure modes or constraints that agents must handle.

**Discussion Points**

* Which APIs should be prioritized for realism improvements?
* How should failure modes be designed to balance realism with testability?
* Any compatibility concerns with existing tasks or evaluations?
* Is it easy to agree on what "realistic" APIs are?

**Next Steps (after consensus)**

* Break proposed changes into actionable Feature/Task issues.
* Assign issues to the `v3.0` milestone and `v3` branch for PRs.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC #3: Improve API Realism for Tau3 #130

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RFC #3: Improve API Realism for Tau3 #130

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions