Releases · sierra-research/tau2-bench

06 Oct 16:25

victorb-sierra

v0.2.0

f8de30c

Release version 0.2.0 - Web-based leaderboard Latest

Latest

Major new features:

Live leaderboard at tau-bench.com
Interactive model comparison and performance visualization
Mobile-responsive design
Comprehensive submission validation system"

Assets 2

26 Aug 23:36

victorb-sierra

v0.1.3

5ba9e3e

v0.1.3: Fixes llm args + remove default NL assertions checks (#23)

* update README, update type in fig, add num tasks cli

* Made pip install -e the default. For non editable install, added option to set a TAU_DATA_DIR to point to the data. Added a fall back to local source if this is not set. Added tau2 check-data cli for people to check data install
fixed num-tasks flag. Fix display of tasks name in cli.

* Fix CLI parser for dict args

There is no way for `dict` to parse a CLI string into a dictionary, so `type=dict` is simply non-functional. This change fixes that by allowing users to pass JSON strings at the CLI to configure LLMs. 

I am using this to pass `api_key` and `api_base` for self-hosted LLMs on an OpenAI API-like endpoint.

* Fix brace escaping

* updated evaluator so that nl assertions are not run by default

---------

Co-authored-by: Honghua Dong <dhh19951@gmail.com>
Co-authored-by: Alexander Conway <alex-dr@users.noreply.github.com>

Assets 2

17 Jul 21:43

victorb-sierra

v0.1.2

40f46d3

v0.1.2

Made pip install -e the default. For non editable install, added opti…

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Releases: sierra-research/tau2-bench

Release version 0.2.0 - Web-based leaderboard

Uh oh!

v0.1.3: Fixes llm args + remove default NL assertions checks (#23)

Uh oh!

v0.1.2

Uh oh!