Skip to content

Error handling for agent_msg.validate() #42

@GaganM

Description

@GaganM

When agent_msg.validate() [Link] throws an exception, the entire evaluation gets shut down.

Ideally, only the result for that particular Task ID should be given a score of 0. Sample logs from run:

Traceback (most recent call last):
  File "/usr/local/home/gaganmadan/tau2-bench/.venv/bin/tau2", line 8, in <module>
    sys.exit(main())
             ~~~~^^
  File "/usr/local/home/gaganmadan/tau2-bench/src/tau2/cli.py", line 207, in main
    args.func(args)
    ~~~~~~~~~^^^^^^
  File "/usr/local/home/gaganmadan/tau2-bench/src/tau2/cli.py", line 141, in <lambda>
    func=lambda args: run_domain(
                      ~~~~~~~~~~^
        RunConfig(
        ^^^^^^^^^^
    ...<17 lines>...
        )
        ^
    )
    ^
  File "/usr/local/home/gaganmadan/tau2-bench/src/tau2/run.py", line 117, in run_domain
    simulation_results = run_tasks(
        domain=config.domain,
    ...<15 lines>...
        log_level=config.log_level,
    )
  File "/usr/local/home/gaganmadan/tau2-bench/src/tau2/run.py", line 348, in run_tasks
    res = list(executor.map(_run, *zip(*args)))
  File "/usr/lib/python3.13/concurrent/futures/_base.py", line 619, in result_iterator
    yield _result_or_cancel(fs.pop())
          ~~~~~~~~~~~~~~~~~^^^^^^^^^^
  File "/usr/lib/python3.13/concurrent/futures/_base.py", line 317, in _result_or_cancel
    return fut.result(timeout)
           ~~~~~~~~~~^^^^^^^^^
  File "/usr/lib/python3.13/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ~~~~~~~~~~~~~~~~~^^
  File "/usr/lib/python3.13/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/usr/lib/python3.13/concurrent/futures/thread.py", line 59, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/home/gaganmadan/tau2-bench/src/tau2/run.py", line 334, in _run
    raise e
  File "/usr/local/home/gaganmadan/tau2-bench/src/tau2/run.py", line 314, in _run
    simulation = run_task(
        domain=domain,
    ...<10 lines>...
        seed=seed,
    )
  File "/usr/local/home/gaganmadan/tau2-bench/src/tau2/run.py", line 464, in run_task
    simulation = orchestrator.run()
  File "/usr/local/home/gaganmadan/tau2-bench/src/tau2/orchestrator/orchestrator.py", line 255, in run
    self.step()
    ~~~~~~~~~^^
  File "/usr/local/home/gaganmadan/tau2-bench/src/tau2/orchestrator/orchestrator.py", line 322, in step
    agent_msg.validate()
    ~~~~~~~~~~~~~~~~~~^^
  File "/usr/local/home/gaganmadan/tau2-bench/src/tau2/data_model/message.py", line 116, in validate
    raise ValueError(
        f"AssistantMessage must have either content or tool calls. Got {self}"
    )
ValueError: AssistantMessage must have either content or tool calls. Got AssistantMessage
timestamp: 2025-09-10T14:06:41.179490
cost: 0.0039951

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions