Enhancement: Add vacation rental domain with host preference adaptation#142
Open
wuTims wants to merge 11 commits intosierra-research:mainfrom
Open
Enhancement: Add vacation rental domain with host preference adaptation#142wuTims wants to merge 11 commits intosierra-research:mainfrom
wuTims wants to merge 11 commits intosierra-research:mainfrom
Conversation
Add new vacation_rental domain for tau2-bench evaluation: - Data model with Property, Booking, Guest, and Review entities - Tools for property search, booking management, and guest support - Task definitions and policy documentation
- Add 6 new models (HostProfile, Issue, GuestHistory, etc.) - Add 9 new tools for host decisions and issue management - Add 11 new tasks (IDs 13-23) - Update db.json with host profiles, issues, decisions - Update policy.md with new sections - Add test suite (20 tests)
…quest_host_decision
- Remove unused host_user_id parameter from get_guest_history() documentation
…ases - Add reservations RES019 (Frieda/LST005) and RES020 (Emeka/LST004) - Add issues with inconclusive evidence (ISS_RES010_002, ISS_RES019_001) - Add host decisions for disputed evidence scenarios - Add host decision for repeat guest exception with strict host
…cenarios Formatting improvements (tasks 0-23): - Expand compact JSON to pretty-printed format - Add reward_basis: ["ACTION"] to all evaluation criteria - Add description field to submit_issue_report actions New test scenarios (tasks 24-31): - Task 24: Military deployment exception with Pierre (approved) - Task 25: Repeat guest exception with strict Alessia (denied) - Task 26: Honest mistake damage with Pierre (split cost) - Task 27: First-time guest early check-in request - Task 28: Service failure with reviews-focused host (proactive refund) - Task 29: Disputed evidence with Ibrahim (partial compensation) - Task 30: Disputed evidence with Alessia (denial) - Task 31: Safety concern requiring human transfer
Add train/test/base/eval task splits matching other domains. - Add split_tasks.json with train (21), test (14), eval (16), base (35) - Add get_tasks_split() and update get_tasks() to filter by split - Register task splits loader in registry - Add tasks 32-34 (host profile and evidence evaluation scenarios) - eval split: 69% nuanced scenarios, 31% basic coverage
- Change 'reason' to 'justification' in process_goodwill_refund call - Matches tool signature and Task 34 usage
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a new vacation rental domain that tests agent preference adaptation through a three-layer decision model: domain policy → host profile → guest context.
Closes #112
Changes Made
vacation_rentalwith 35 tasks across 8 listings and 3 host archetypesTesting
pytest tests/test_domains/test_vacation_rental/)ruff check)Documentation
data/tau2/domains/vacation_rental/README.md- domain design and three-layer modeldata/tau2/domains/vacation_rental/policy.md- agent policy document