Skip to content

novitalabs/rft-tinker

Repository files navigation

RFT-Tinker: R2E-Gym Training with Tinker API + Agent Sandbox

Overview

Experimental setup for training code generation models on R2E-Gym dataset using:

  • Tinker API for RL model training
  • Agent Sandbox for safe code execution
  • R2E-Gym Dataset (4.5K real-world GitHub issues)

Reproducing DeepSWE experiments (42.2% Pass@1 on SWE-Bench-Verified).

Quick Start

1. Clone Repository

git clone https://github.com/novitalabs/rft-tinker.git
cd rft-tinker

2. Install Dependencies

python3 -m venv venv
source venv/bin/activate
pip install datasets huggingface-hub novita-sandbox tinker torch transformers

3. Configure API Keys

Copy the example environment file:

cp .env.example .env.local

Edit .env.local with your API keys:

# Agent Sandbox API Key (get from https://novita.ai)
NOVITA_API_KEY=your_novita_api_key_here

# Tinker API Token (get from Tinker platform)
TINKER_API_TOKEN=your_tinker_api_token_here

# Template IDs
NOVITA_TEMPLATE_BASE=vn9xnp3cm92x6rmqlgwc

Warning: Never commit .env.local with real credentials!

4. Run Tests

Test Agent Sandbox connectivity:

python -m tests.integration.test_novita_basic

Test R2E-Gym workflow:

python -m tests.integration.test_r2e_gym_workflow

5. Prepare Dataset

Download R2E-Gym sample (50 instances):

python scripts/prepare_data/prepare_r2e_sample.py

Test dataset loading:

python -m tests.unit.test_dataset_loading

Project Structure

rft-tinker/
├── src/                    # Core source code
│   ├── datasets/           # Dataset utilities and repo mapping
│   ├── environments/       # Sandbox environment wrappers
│   ├── rollout/            # Multi-turn rollout pipeline
│   └── utils/              # Utility functions
├── tests/                  # All test files
│   ├── integration/        # Integration tests
│   ├── rollout/            # Rollout pipeline tests
│   └── unit/               # Unit tests
├── scripts/                # Utility scripts
├── templates/              # Agent Sandbox Dockerfile templates
├── docs/                   # Documentation
├── data/                   # Datasets (gitignored)
├── outputs/                # Generated outputs (gitignored)
├── tinker_r2e_training.py  # RL training script
├── tinker_sft_training.py  # SFT training script
└── .env.example            # API keys template

Training Scripts

RL Training (GRPO)

python tinker_r2e_training.py

Configuration (in script):

Parameter Value Purpose
GROUP_SIZE 10 Parallel sandboxes per problem
MAX_STEPS 40 Max actions per episode
SAVE_INTERVAL 2 Checkpoint frequency (batches)
TEMPERATURE 1.0 Sampling temperature

SFT Training (Optional Warm-Start)

python tinker_sft_training.py

Converts gold patches to edit trajectories for supervised fine-tuning warm-start.

Weight Validation

python validate_sft_weights.py

Validates SFT checkpoint weights before RL training.

Agent Sandbox Templates

r2e-gym-base (vn9xnp3cm92x6rmqlgwc)

  • Python 3.8.10, pytest 8.3.5, numpy 1.24.4
  • Core: scipy, sympy, requests, pillow
  • For most Python repositories

r2e-gym-scientific

  • Adds: pandas, scikit-learn, matplotlib, seaborn, h5py
  • For scientific computing

r2e-gym-pillow

  • Pillow 10.4.0 with full image processing
  • For image-heavy repositories

Agent Sandbox API

from novita_sandbox.core import Sandbox

# Create sandbox
sandbox = Sandbox.create(
    api_key=api_key,
    template=template_id,
    timeout=3600
)

# Run commands (synchronous - no await)
result = sandbox.commands.run("echo 'Hello World'")
print(result.stdout)
print(result.exit_code)

# Write files
sandbox.files.write("/path/to/file.py", content.encode())

R2E-Gym Workflow

Standard evaluation workflow:

# 1. Clone repo at base commit
sandbox.commands.run(f"git clone {repo_url} /tmp/testbed")
sandbox.commands.run(f"cd /tmp/testbed && git checkout {base_commit}")

# 2. Apply model-generated patch
sandbox.files.write("/tmp/patch.diff", patch_content)
sandbox.commands.run("cd /tmp/testbed && git apply /tmp/patch.diff")

# 3. Run tests that should now pass (FAIL_TO_PASS)
result = sandbox.commands.run(f"cd /tmp/testbed && pytest {fail_tests}")

# 4. Run tests that should remain passing (PASS_TO_PASS)
result = sandbox.commands.run(f"cd /tmp/testbed && pytest {pass_tests}")

# 5. Compute reward
reward = 1.0 if all_tests_passed else 0.0

Dataset Schema

Each R2E-Gym instance contains:

{
    "instance_id": "orange3__2d9617bd",
    "repo": "orange3",
    "commit_hash": "2d9617bd0cb1f0ba61771258410ab8fae8e7e24d",
    "problem_statement": "[ISSUE] ...",
    "modified_files": [...],
    "test_files": ["test_1.py"],
    "test_codes": ["..."],
    "old_commit_exit_code": 1,  # Tests fail before fix
    "new_commit_exit_code": 0,  # Tests pass after fix
    "gold_patch": {...}
}

Available Actions (in Rollout Generator)

The rollout generator provides 8 tools for the model:

  1. bash - Execute shell commands
  2. read - Read file content (with line range support)
  3. search - Pattern search (grep -rn)
  4. find_file - Locate files by pattern
  5. list_dir - Directory listing (ls -lah)
  6. edit - Line-based file editing
  7. run_test - Execute test commands
  8. submit - Submit solution

Performance Notes

Based on actual training measurements:

Phase Duration % of Batch
Sandbox creation (10×) ~21s 1.2%
Repository setup (10×) ~2 min 6.7%
Rollout execution ~25-28 min ~90%
Training update ~30s 1.7%
Sandbox cleanup ~15s 0.8%

Key metrics:

  • Sandbox hot-start latency: 60-100ms/task
  • Concurrent sandboxes: Up to 150 per account

DeepSWE Comparison

Aspect DeepSWE This Setup
Model Qwen3-32B Qwen3-30B-A3B
Hardware 64 H100 Tinker
Dataset R2E-Gym (4.5K) Same ✅
Sandbox Kubernetes + Docker Agent Sandbox ✅
Pass@1 42.2% (SOTA) TBD

Documentation

References

License

MIT License

About

RL training demo with tinker + sandbox

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •