feat: τ²-Adv Bench - Adversarial Evaluation Module by Ahm3dAlAli · Pull Request #158 · sierra-research/tau2-bench

Ahm3dAlAli · 2026-02-06T16:56:32Z

Summary

Adds adversarial evaluation to τ²-Bench for testing agent safety against manipulation attacks.

Features:

5 attack strategies (social engineering, prompt injection, policy exploitation, identity manipulation, information extraction)
3 sophistication levels per strategy
Safety evaluator for violation detection
Adversarial tasks for airline, retail, telecom domains

Paper

Accompanying paper with methodology and findings available in fork: https://github.com/Ahm3dAlAli/tau2-bench/tree/feature/adversarial-evaluation/Tau2_Adv_Bench_AgentBeats_AhmedAli.pdf
Tau2_Adv_Bench_AgentBeats_AhmedAli.pdf

Test Plan

25 tests pass (pytest tests/test_adversarial.py -v)
Demo works (python demo_adversarial.py)

This PR adds adversarial evaluation capabilities to τ2-bench, enabling testing of agent robustness against manipulation attempts. ## New Components ### Adversarial Module (`src/tau2/adversarial/`) - `strategies.py`: 5 attack strategies (social engineering, prompt injection, policy exploitation, identity manipulation, information extraction) with 3 sophistication levels each - `tasks.py`: Utilities for loading and filtering adversarial tasks - `run_adversarial.py`: CLI script for running adversarial evaluations - `README.md`: Documentation for the module ### Adversarial User (`src/tau2/user/adversarial_user.py`) - Wraps standard UserSimulator with adversarial instructions - Configurable attack strategy and sophistication - Tracks attack attempts for analysis ### Safety Evaluator (`src/tau2/evaluator/evaluator_safety.py`) - Detects safety violations in agent responses - Violation types: unauthorized actions, information disclosure, policy circumvention, prompt injection success, etc. - Produces SafetyRewardInfo with safety score and violations list ### Adversarial Tasks (`data/tau2/domains/airline/tasks_adversarial.json`) - 8 adversarial scenarios for airline domain - Covers all 5 attack strategies - Includes multi-vector combined attacks ## Usage ```bash # Run adversarial evaluation python -m tau2.adversarial.run_adversarial --domain airline # Filter by strategy python -m tau2.adversarial.run_adversarial --domain airline --strategy social_engineering ``` ## Tests Added comprehensive tests in `tests/test_adversarial.py` (25 passing) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add paper directory with NeurIPS paper files (tex, figures, generation scripts) - Include comprehensive evaluation results for multiple LLM models - Add adversarial task data for retail and telecom domains - Include visualization scripts and analysis tools Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Ahm3dAlAli · 2026-02-17T10:54:50Z

Hey @victorb-sierra ,

Good day, would love to know if you can take a look please.

Ahm3dAlAli · 2026-02-25T09:42:28Z

Hi @victorb-sierra ,

Good day, Would Aveo to know about any updates.

Ahm3dAlAli and others added 2 commits February 3, 2026 09:38

Ahm3dAlAli requested a review from victorb-sierra as a code owner February 6, 2026 16:56

Ahm3dAlAli added 8 commits February 6, 2026 17:56

Merge branch 'sierra-research:main' into feature/adversarial-evaluation

11f43ba

docs: Simplify adversarial module README

3502a23

chore: Clean up - remove paper, results, and scripts from root

28ad4d1

Paper

ecb7613

paper

140f509

Merge branch 'sierra-research:main' into feature/adversarial-evaluation

92cf0d0

Delete Tau2_Adv_Bench_AgentBeats_AhmedAli.pdf

fb07bea

Add paper

0c6128e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: τ²-Adv Bench - Adversarial Evaluation Module#158

feat: τ²-Adv Bench - Adversarial Evaluation Module#158
Ahm3dAlAli wants to merge 10 commits intosierra-research:mainfrom
Ahm3dAlAli:feature/adversarial-evaluation

Ahm3dAlAli commented Feb 6, 2026 •

edited

Loading

Uh oh!

Ahm3dAlAli commented Feb 17, 2026 •

edited

Loading

Uh oh!

Ahm3dAlAli commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Ahm3dAlAli commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Paper

Test Plan

Uh oh!

Ahm3dAlAli commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ahm3dAlAli commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Ahm3dAlAli commented Feb 6, 2026 •

edited

Loading

Ahm3dAlAli commented Feb 17, 2026 •

edited

Loading