New Domain Proposal - Vacation Rental

## Problem/Goal

We need a domain that enables testing of more nuanced customer service interactions where agents must exercise judgment within policy constraints.

The vacation rental domain (simulating platforms like Airbnb/VRBO) provides an ideal context for this because:

1. **Multi-layered decision context:** Agents must consider domain policy, host-specific requirements, listing details, and user profile information. They must make judgement calls based on host interests, while still enforcing policy and escalating to human-in-the-loop when necessary.

2. **Rich scenario space:** With three separate parties (guest, host, domain policy) as well as listing context, the vacation rental domain introduces many diverse interactions and scenarios. Some potential interactions include: booking modifications, property disputes, damage claims, communication mediation, safety concerns, policy interpretation.
---

## Proposed Solution

### Phase 1: Research & Policy Foundation

- Research industry-standard policies for vacation rental platforms regarding cancellations, refunds, host obligations, and guest behavior
- Establish a comprehensive domain policy document grounded in real-world practices
- Define cancellation policy tiers (flexible, moderate, firm, strict) with clear refund rules 

### Phase 2: Baseline Implementation

- Create minimal data schema: users, listings (with host-selected cancellation policies), reservations
- Implement core agent tools: `get_user_details`, `get_reservation_details`, `get_listing_details`, `cancel_reservation`, `process_refund`, `transfer_to_human_agents`
- Implement user tools for tau2 dual tool-use: `get_user_id`, `get_reservation_id`
- Design base tasks: test simple cancellation and refund flows to ensure data, tools, and policy are functional

### Phase 3: Objective Policy Adherence Tasks (~13 tasks)

Implement baseline tasks:

- **Lookup Chain Verification (3 tasks):** Tests listing lookup
- **Refund Outcome Coverage (4 tasks):** Tests each refund type in the policy definition
- **Rejection Guards (2 tasks):** Validates error handling for invalid operations
- **Information Retrieval (2 tasks):** Tests disambiguation and user profile lookup
- **Exception Flows (2 tasks):** Tests policy overrides (free cancellation period, host-initiated cancellations)

### Phase 4: Complex Judgment-Based Evaluations (Iteratively introduce tasks)

Once objective baseline is established, iteratively define and implement more complex scenarios:

- Ambiguous documentation claims requiring interpretation
- Partial evidence for major events
- Host vs. guest dispute scenarios (leveraging tau2 dual tool-use)
- User claims that conflict with recorded data
- Mid-trip cancellation scenarios

---

## Impact

**New Components:**

- `data/tau2/domains/vacation_rental/` - Domain data (db.json, policy.md, tasks.json)
- `src/tau2/domains/vacation_rental/` - Domain implementation (tools, wiki)

**Framework Components:**

- Domain registry (register new domain)
- No changes to core evaluation framework expected

---

## Timeline

Estimated completion: Jan 10th, 2026

_Note: Phase 4 will be continually improving and expanding tasks._

1. Policy research and foundation - 6hrs
2. Minimal data and tool implementation - 2hrs
3. Baseline tasks with validation - 6hrs
4. Expanded task coverage based on findings - 4 weeks

---

## Dependencies

- No external library dependencies beyond existing framework


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Domain Proposal - Vacation Rental #112

Problem/Goal

Proposed Solution

Phase 1: Research & Policy Foundation

Phase 2: Baseline Implementation

Phase 3: Objective Policy Adherence Tasks (~13 tasks)

Phase 4: Complex Judgment-Based Evaluations (Iteratively introduce tasks)

Impact

Timeline

Dependencies

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

New Domain Proposal - Vacation Rental #112

Description

Problem/Goal

Proposed Solution

Phase 1: Research & Policy Foundation

Phase 2: Baseline Implementation

Phase 3: Objective Policy Adherence Tasks (~13 tasks)

Phase 4: Complex Judgment-Based Evaluations (Iteratively introduce tasks)

Impact

Timeline

Dependencies

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions