-
Notifications
You must be signed in to change notification settings - Fork 193
Open
Description
When agent_msg.validate() [Link] throws an exception, the entire evaluation gets shut down.
Ideally, only the result for that particular Task ID should be given a score of 0. Sample logs from run:
Traceback (most recent call last):
File "/usr/local/home/gaganmadan/tau2-bench/.venv/bin/tau2", line 8, in <module>
sys.exit(main())
~~~~^^
File "/usr/local/home/gaganmadan/tau2-bench/src/tau2/cli.py", line 207, in main
args.func(args)
~~~~~~~~~^^^^^^
File "/usr/local/home/gaganmadan/tau2-bench/src/tau2/cli.py", line 141, in <lambda>
func=lambda args: run_domain(
~~~~~~~~~~^
RunConfig(
^^^^^^^^^^
...<17 lines>...
)
^
)
^
File "/usr/local/home/gaganmadan/tau2-bench/src/tau2/run.py", line 117, in run_domain
simulation_results = run_tasks(
domain=config.domain,
...<15 lines>...
log_level=config.log_level,
)
File "/usr/local/home/gaganmadan/tau2-bench/src/tau2/run.py", line 348, in run_tasks
res = list(executor.map(_run, *zip(*args)))
File "/usr/lib/python3.13/concurrent/futures/_base.py", line 619, in result_iterator
yield _result_or_cancel(fs.pop())
~~~~~~~~~~~~~~~~~^^^^^^^^^^
File "/usr/lib/python3.13/concurrent/futures/_base.py", line 317, in _result_or_cancel
return fut.result(timeout)
~~~~~~~~~~^^^^^^^^^
File "/usr/lib/python3.13/concurrent/futures/_base.py", line 449, in result
return self.__get_result()
~~~~~~~~~~~~~~~~~^^
File "/usr/lib/python3.13/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
File "/usr/lib/python3.13/concurrent/futures/thread.py", line 59, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/home/gaganmadan/tau2-bench/src/tau2/run.py", line 334, in _run
raise e
File "/usr/local/home/gaganmadan/tau2-bench/src/tau2/run.py", line 314, in _run
simulation = run_task(
domain=domain,
...<10 lines>...
seed=seed,
)
File "/usr/local/home/gaganmadan/tau2-bench/src/tau2/run.py", line 464, in run_task
simulation = orchestrator.run()
File "/usr/local/home/gaganmadan/tau2-bench/src/tau2/orchestrator/orchestrator.py", line 255, in run
self.step()
~~~~~~~~~^^
File "/usr/local/home/gaganmadan/tau2-bench/src/tau2/orchestrator/orchestrator.py", line 322, in step
agent_msg.validate()
~~~~~~~~~~~~~~~~~~^^
File "/usr/local/home/gaganmadan/tau2-bench/src/tau2/data_model/message.py", line 116, in validate
raise ValueError(
f"AssistantMessage must have either content or tool calls. Got {self}"
)
ValueError: AssistantMessage must have either content or tool calls. Got AssistantMessage
timestamp: 2025-09-10T14:06:41.179490
cost: 0.0039951
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels