Skip to content

Enhancement: Add vacation rental domain with host preference adaptation#142

Open
wuTims wants to merge 11 commits intosierra-research:mainfrom
wuTims:domain/vacation-rental
Open

Enhancement: Add vacation rental domain with host preference adaptation#142
wuTims wants to merge 11 commits intosierra-research:mainfrom
wuTims:domain/vacation-rental

Conversation

@wuTims
Copy link

@wuTims wuTims commented Jan 15, 2026

Summary

Adds a new vacation rental domain that tests agent preference adaptation through a three-layer decision model: domain policy → host profile → guest context.

Closes #112

Changes Made

  • New domain: vacation_rental with 35 tasks across 8 listings and 3 host archetypes
  • Host profiles with distinct philosophies (reviews-focused, revenue-focused, relationships-focused)
  • Pre-computed host decisions enabling deterministic evaluation of preference adaptation
  • Task splits: train (21), test (14), eval (16), base (35)

Testing

  • 20 unit tests covering all tools (pytest tests/test_domains/test_vacation_rental/)
  • All tests pass
  • Linting passes (ruff check)

Documentation

  • data/tau2/domains/vacation_rental/README.md - domain design and three-layer model
  • data/tau2/domains/vacation_rental/policy.md - agent policy document

wuTims added 11 commits January 15, 2026 06:41
Add new vacation_rental domain for tau2-bench evaluation:
- Data model with Property, Booking, Guest, and Review entities
- Tools for property search, booking management, and guest support
- Task definitions and policy documentation
- Add 6 new models (HostProfile, Issue, GuestHistory, etc.)
- Add 9 new tools for host decisions and issue management
- Add 11 new tasks (IDs 13-23)
- Update db.json with host profiles, issues, decisions
- Update policy.md with new sections
- Add test suite (20 tests)
- Remove unused host_user_id parameter from get_guest_history() documentation
…ases

- Add reservations RES019 (Frieda/LST005) and RES020 (Emeka/LST004)
- Add issues with inconclusive evidence (ISS_RES010_002, ISS_RES019_001)
- Add host decisions for disputed evidence scenarios
- Add host decision for repeat guest exception with strict host
…cenarios

Formatting improvements (tasks 0-23):
- Expand compact JSON to pretty-printed format
- Add reward_basis: ["ACTION"] to all evaluation criteria
- Add description field to submit_issue_report actions

New test scenarios (tasks 24-31):
- Task 24: Military deployment exception with Pierre (approved)
- Task 25: Repeat guest exception with strict Alessia (denied)
- Task 26: Honest mistake damage with Pierre (split cost)
- Task 27: First-time guest early check-in request
- Task 28: Service failure with reviews-focused host (proactive refund)
- Task 29: Disputed evidence with Ibrahim (partial compensation)
- Task 30: Disputed evidence with Alessia (denial)
- Task 31: Safety concern requiring human transfer
Add train/test/base/eval task splits matching other domains.

- Add split_tasks.json with train (21), test (14), eval (16), base (35)
- Add get_tasks_split() and update get_tasks() to filter by split
- Register task splits loader in registry
- Add tasks 32-34 (host profile and evidence evaluation scenarios)
- eval split: 69% nuanced scenarios, 31% basic coverage
- Change 'reason' to 'justification' in process_goodwill_refund call
- Matches tool signature and Task 34 usage
@wuTims wuTims changed the title domain: Add vacation rental domain with host preference adaptation Enhancement: Add vacation rental domain with host preference adaptation Jan 15, 2026
@victorb-sierra victorb-sierra added enhancement New feature or request new domain Proposal for a new domain labels Jan 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request new domain Proposal for a new domain

Projects

None yet

Development

Successfully merging this pull request may close these issues.

New Domain Proposal - Vacation Rental

2 participants