Enhancement: Add vacation rental domain with host preference adaptation by wuTims · Pull Request #142 · sierra-research/tau2-bench

wuTims · 2026-01-15T06:46:07Z

Summary

Adds a new vacation rental domain that tests agent preference adaptation through a three-layer decision model: domain policy → host profile → guest context.

Closes #112

Changes Made

New domain: vacation_rental with 35 tasks across 8 listings and 3 host archetypes
Host profiles with distinct philosophies (reviews-focused, revenue-focused, relationships-focused)
Pre-computed host decisions enabling deterministic evaluation of preference adaptation
Task splits: train (21), test (14), eval (16), base (35)

Testing

20 unit tests covering all tools (pytest tests/test_domains/test_vacation_rental/)
All tests pass
Linting passes (ruff check)

Documentation

data/tau2/domains/vacation_rental/README.md - domain design and three-layer model
data/tau2/domains/vacation_rental/policy.md - agent policy document

Add new vacation_rental domain for tau2-bench evaluation: - Data model with Property, Booking, Guest, and Review entities - Tools for property search, booking management, and guest support - Task definitions and policy documentation

- Add 6 new models (HostProfile, Issue, GuestHistory, etc.) - Add 9 new tools for host decisions and issue management - Add 11 new tasks (IDs 13-23) - Update db.json with host profiles, issues, decisions - Update policy.md with new sections - Add test suite (20 tests)

…arser

…t profile

…quest_host_decision

- Remove unused host_user_id parameter from get_guest_history() documentation

…ases - Add reservations RES019 (Frieda/LST005) and RES020 (Emeka/LST004) - Add issues with inconclusive evidence (ISS_RES010_002, ISS_RES019_001) - Add host decisions for disputed evidence scenarios - Add host decision for repeat guest exception with strict host

…cenarios Formatting improvements (tasks 0-23): - Expand compact JSON to pretty-printed format - Add reward_basis: ["ACTION"] to all evaluation criteria - Add description field to submit_issue_report actions New test scenarios (tasks 24-31): - Task 24: Military deployment exception with Pierre (approved) - Task 25: Repeat guest exception with strict Alessia (denied) - Task 26: Honest mistake damage with Pierre (split cost) - Task 27: First-time guest early check-in request - Task 28: Service failure with reviews-focused host (proactive refund) - Task 29: Disputed evidence with Ibrahim (partial compensation) - Task 30: Disputed evidence with Alessia (denial) - Task 31: Safety concern requiring human transfer

Add train/test/base/eval task splits matching other domains. - Add split_tasks.json with train (21), test (14), eval (16), base (35) - Add get_tasks_split() and update get_tasks() to filter by split - Register task splits loader in registry - Add tasks 32-34 (host profile and evidence evaluation scenarios) - eval split: 69% nuanced scenarios, 31% basic coverage

- Change 'reason' to 'justification' in process_goodwill_refund call - Matches tool signature and Task 34 usage

…del design

wuTims added 11 commits January 15, 2026 06:41

feat(vacation-rental): add vacation rental domain

5781bd2

Add new vacation_rental domain for tau2-bench evaluation: - Data model with Property, Booking, Guest, and Review entities - Tools for property search, booking management, and guest support - Task definitions and policy documentation

fix(vacation-rental): replace eval() with safe AST-based expression p…

b0a6572

…arser

fix(vacation-rental): add default cap for goodwill refund without hos…

579ada5

…t profile

refactor(vacation-rental): remove unused reservation_id param from re…

df10dd3

…quest_host_decision

docs(vacation-rental): fix get_guest_history function signature

a87cb22

- Remove unused host_user_id parameter from get_guest_history() documentation

fix(vacation-rental): correct argument name in Task 28

e335e55

- Change 'reason' to 'justification' in process_goodwill_refund call - Matches tool signature and Task 34 usage

docs(vacation-rental): add domain README with three-layer decision mo…

360938e

…del design

wuTims requested a review from victorb-sierra as a code owner January 15, 2026 06:46

wuTims changed the title ~~domain: Add vacation rental domain with host preference adaptation~~ Enhancement: Add vacation rental domain with host preference adaptation Jan 15, 2026

victorb-sierra added enhancement New feature or request new domain Proposal for a new domain labels Jan 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhancement: Add vacation rental domain with host preference adaptation#142

Enhancement: Add vacation rental domain with host preference adaptation#142
wuTims wants to merge 11 commits intosierra-research:mainfrom
wuTims:domain/vacation-rental

wuTims commented Jan 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wuTims commented Jan 15, 2026

Summary

Changes Made

Testing

Documentation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants