Releases: sierra-research/tau2-bench
Releases · sierra-research/tau2-bench
Release version 0.2.0 - Web-based leaderboard
Major new features:
- Live leaderboard at tau-bench.com
- Interactive model comparison and performance visualization
- Mobile-responsive design
- Comprehensive submission validation system"
v0.1.3: Fixes llm args + remove default NL assertions checks (#23)
* update README, update type in fig, add num tasks cli * Made pip install -e the default. For non editable install, added option to set a TAU_DATA_DIR to point to the data. Added a fall back to local source if this is not set. Added tau2 check-data cli for people to check data install fixed num-tasks flag. Fix display of tasks name in cli. * Fix CLI parser for dict args There is no way for `dict` to parse a CLI string into a dictionary, so `type=dict` is simply non-functional. This change fixes that by allowing users to pass JSON strings at the CLI to configure LLMs. I am using this to pass `api_key` and `api_base` for self-hosted LLMs on an OpenAI API-like endpoint. * Fix brace escaping * updated evaluator so that nl assertions are not run by default --------- Co-authored-by: Honghua Dong <dhh19951@gmail.com> Co-authored-by: Alexander Conway <alex-dr@users.noreply.github.com>
v0.1.2
Made pip install -e the default. For non editable install, added opti…