Error handling for `agent_msg.validate()`

When `agent_msg.validate()` [[Link](https://github.com/sierra-research/tau2-bench/blob/5ba9e3e56db57c5e4114bf7f901291f09b2c5619/src/tau2/orchestrator/orchestrator.py#L322)] throws an exception, the entire evaluation gets shut down.

Ideally, only the result for that particular Task ID should be given a score of 0. Sample logs from run: 

```
Traceback (most recent call last):
  File "/usr/local/home/gaganmadan/tau2-bench/.venv/bin/tau2", line 8, in <module>
    sys.exit(main())
             ~~~~^^
  File "/usr/local/home/gaganmadan/tau2-bench/src/tau2/cli.py", line 207, in main
    args.func(args)
    ~~~~~~~~~^^^^^^
  File "/usr/local/home/gaganmadan/tau2-bench/src/tau2/cli.py", line 141, in <lambda>
    func=lambda args: run_domain(
                      ~~~~~~~~~~^
        RunConfig(
        ^^^^^^^^^^
    ...<17 lines>...
        )
        ^
    )
    ^
  File "/usr/local/home/gaganmadan/tau2-bench/src/tau2/run.py", line 117, in run_domain
    simulation_results = run_tasks(
        domain=config.domain,
    ...<15 lines>...
        log_level=config.log_level,
    )
  File "/usr/local/home/gaganmadan/tau2-bench/src/tau2/run.py", line 348, in run_tasks
    res = list(executor.map(_run, *zip(*args)))
  File "/usr/lib/python3.13/concurrent/futures/_base.py", line 619, in result_iterator
    yield _result_or_cancel(fs.pop())
          ~~~~~~~~~~~~~~~~~^^^^^^^^^^
  File "/usr/lib/python3.13/concurrent/futures/_base.py", line 317, in _result_or_cancel
    return fut.result(timeout)
           ~~~~~~~~~~^^^^^^^^^
  File "/usr/lib/python3.13/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ~~~~~~~~~~~~~~~~~^^
  File "/usr/lib/python3.13/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/usr/lib/python3.13/concurrent/futures/thread.py", line 59, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/home/gaganmadan/tau2-bench/src/tau2/run.py", line 334, in _run
    raise e
  File "/usr/local/home/gaganmadan/tau2-bench/src/tau2/run.py", line 314, in _run
    simulation = run_task(
        domain=domain,
    ...<10 lines>...
        seed=seed,
    )
  File "/usr/local/home/gaganmadan/tau2-bench/src/tau2/run.py", line 464, in run_task
    simulation = orchestrator.run()
  File "/usr/local/home/gaganmadan/tau2-bench/src/tau2/orchestrator/orchestrator.py", line 255, in run
    self.step()
    ~~~~~~~~~^^
  File "/usr/local/home/gaganmadan/tau2-bench/src/tau2/orchestrator/orchestrator.py", line 322, in step
    agent_msg.validate()
    ~~~~~~~~~~~~~~~~~~^^
  File "/usr/local/home/gaganmadan/tau2-bench/src/tau2/data_model/message.py", line 116, in validate
    raise ValueError(
        f"AssistantMessage must have either content or tool calls. Got {self}"
    )
ValueError: AssistantMessage must have either content or tool calls. Got AssistantMessage
timestamp: 2025-09-10T14:06:41.179490
cost: 0.0039951
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error handling for `agent_msg.validate()` #42

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Error handling for agent_msg.validate() #42

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Error handling for `agent_msg.validate()` #42