Skip to content

New Domain Proposal - Vacation Rental #112

@wuTims

Description

@wuTims

Problem/Goal

We need a domain that enables testing of more nuanced customer service interactions where agents must exercise judgment within policy constraints.

The vacation rental domain (simulating platforms like Airbnb/VRBO) provides an ideal context for this because:

  1. Multi-layered decision context: Agents must consider domain policy, host-specific requirements, listing details, and user profile information. They must make judgement calls based on host interests, while still enforcing policy and escalating to human-in-the-loop when necessary.

  2. Rich scenario space: With three separate parties (guest, host, domain policy) as well as listing context, the vacation rental domain introduces many diverse interactions and scenarios. Some potential interactions include: booking modifications, property disputes, damage claims, communication mediation, safety concerns, policy interpretation.


Proposed Solution

Phase 1: Research & Policy Foundation

  • Research industry-standard policies for vacation rental platforms regarding cancellations, refunds, host obligations, and guest behavior
  • Establish a comprehensive domain policy document grounded in real-world practices
  • Define cancellation policy tiers (flexible, moderate, firm, strict) with clear refund rules

Phase 2: Baseline Implementation

  • Create minimal data schema: users, listings (with host-selected cancellation policies), reservations
  • Implement core agent tools: get_user_details, get_reservation_details, get_listing_details, cancel_reservation, process_refund, transfer_to_human_agents
  • Implement user tools for tau2 dual tool-use: get_user_id, get_reservation_id
  • Design base tasks: test simple cancellation and refund flows to ensure data, tools, and policy are functional

Phase 3: Objective Policy Adherence Tasks (~13 tasks)

Implement baseline tasks:

  • Lookup Chain Verification (3 tasks): Tests listing lookup
  • Refund Outcome Coverage (4 tasks): Tests each refund type in the policy definition
  • Rejection Guards (2 tasks): Validates error handling for invalid operations
  • Information Retrieval (2 tasks): Tests disambiguation and user profile lookup
  • Exception Flows (2 tasks): Tests policy overrides (free cancellation period, host-initiated cancellations)

Phase 4: Complex Judgment-Based Evaluations (Iteratively introduce tasks)

Once objective baseline is established, iteratively define and implement more complex scenarios:

  • Ambiguous documentation claims requiring interpretation
  • Partial evidence for major events
  • Host vs. guest dispute scenarios (leveraging tau2 dual tool-use)
  • User claims that conflict with recorded data
  • Mid-trip cancellation scenarios

Impact

New Components:

  • data/tau2/domains/vacation_rental/ - Domain data (db.json, policy.md, tasks.json)
  • src/tau2/domains/vacation_rental/ - Domain implementation (tools, wiki)

Framework Components:

  • Domain registry (register new domain)
  • No changes to core evaluation framework expected

Timeline

Estimated completion: Jan 10th, 2026

Note: Phase 4 will be continually improving and expanding tasks.

  1. Policy research and foundation - 6hrs
  2. Minimal data and tool implementation - 2hrs
  3. Baseline tasks with validation - 6hrs
  4. Expanded task coverage based on findings - 4 weeks

Dependencies

  • No external library dependencies beyond existing framework

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions