This guide covers how to collect teleoperation demonstration data and process datasets for training.
- Dataset Collection and Processing Guide
The LeHome Challenge supports collecting demonstration data through teleoperation using keyboard or SO101 Leader Arms. The collected data is saved in LeRobot dataset format and can be processed and merged for training.
This section covers the setup and configuration of SO101 Leader Arms for teleoperation.
For Dual SO101 Leader Arms teleoperation:
-
Physical Connection
- Connect both Leader Arms to the computer via USB
- Ensure both arms are powered on
-
Identify Serial Devices
ls /dev/ttyACM* # Usually displays: /dev/ttyACM0, /dev/ttyACM1
-
Grant Serial Permissions
sudo chmod 666 /dev/ttyACM0 sudo chmod 666 /dev/ttyACM1
-
Determine Left/Right Mapping
- Note which serial port corresponds to left/right arm
- Note: Usually, the USB port plugged in first is assigned
/dev/ttyACM0 - Example: Left Arm:
/dev/ttyACM0, Right Arm:/dev/ttyACM1
First-time use or after changing hardware, calibrate the SO101 Leader Arms:
python -m scripts.dataset_sim record \
--teleop_device bi-so101leader \
--device "cpu" \
--recalibrate \
--enable_camerasCalibration Steps:
- Run the command above; the program will prompt for calibration
- Left Arm Calibration: Move the left arm to maximum/minimum positions for each joint to record joint limits
- Right Arm Calibration: Repeat the process for the right arm
- Calibration data is saved automatically
- Press Ctrl+C to exit
Calibration File Location:
Calibration data is saved in:
source/lehome/lehome/devices/lerobot/.cache/
├── left_arm_calibration.json
└── right_arm_calibration.json
Note: Calibration is required only once. Recalibrate if hardware is replaced or control feels inaccurate.
Test the hardware calibration without recording:
python -m scripts.dataset_sim record \
--teleop_device bi-so101leader \
--device "cpu" \
--enable_camerasVerification Steps:
- IsaacSim window opens displaying the scene
- Move the Leader Arm and verify the simulated robot follows correctly
- Confirm left/right arm mapping is correct
- Press Ctrl+C to exit
Recommended command for dual-arm data collection:
python -m scripts.dataset_sim record \
--teleop_device bi-so101leader \
--garment_name Top_Long_Unseen_0 \
--enable_record \
--num_episode 10 \
--log_success \
--device "cpu" \
--enable_camerasInput Device Options:
| Parameter | Single-Arm | Dual-Arm | Description |
|---|---|---|---|
--teleop_device |
keyboard |
bi-keyboard |
Keyboard control |
--teleop_device |
so101leader |
bi-so101leader |
SO101 Leader device |
--port |
/dev/ttyACM0 |
- | Single-arm serial port |
--left_arm_port |
- | /dev/ttyACM0 |
Dual-arm left port |
--right_arm_port |
- | /dev/ttyACM1 |
Dual-arm right port |
--recalibrate |
Optional | Optional | Calibrate on first use |
Task Configuration:
| Parameter | Example | Description |
|---|---|---|
--task |
LeHome-SO101-Direct-Garment-v2 |
Single-arm task |
--task |
LeHome-BiSO101-Direct-Garment-v2 |
Dual-arm task |
--garment_name |
Top_Long_Unseen_0 |
Garment identifier |
--garment_version |
Release / Holdout |
Asset version |
--task_description |
"fold the garment" |
Task label for filtering |
Recording Options:
| Parameter | Default | Description |
|---|---|---|
--enable_record |
Required | Enable dataset recording |
--num_episode |
20 | Number of episodes |
--step_hz |
120 | Environment stepping rate (Hz) |
--log_success |
False | Enable real-time success checking and distance logging (useful for monitoring task progress) |
--disable_depth |
False | Disable depth maps (faster) |
--enable_pointcloud |
False | Please convert depth to pointcloud offline following Add pointcloud |
--record_ee_pose |
False | Record end-effector pose |
--ee_urdf_path |
None | URDF path (required for --record_ee_pose) |
--ee_state_unit |
rad |
Joint angle unit (rad or deg) |
Simulation Options (AppLauncher):
| Parameter | Default | Description |
|---|---|---|
--enable_cameras |
False | Enable camera rendering |
--headless |
False | Run in headless mode (no GUI) |
--device |
cpu |
Use cpu to avoid garment physics issues |
Function Keys:
- B key: Start teleoperation (activate control, must be pressed before
S) - S key: Start recording current episode
- N key: Save current episode (mark as success)
- D key: Discard current episode (re-record)
- ESC key: Abort recording and clear buffer
- Ctrl+C: Exit program
Keyboard Control (for keyboard/bi-keyboard):
- Single-arm: WASD (position), QE (Z-axis), Arrow keys (orientation), Space/Shift (gripper)
- Dual-arm: Letter keys (T/G, Y/H, U/J, I/K, O/L, Q/A) for left arm, Number keys (1/2, 3/4, 5/6, 7/8, 9/0, [/] or -/+) for right arm
SO101 Leader Control:
- Directly move the physical Leader Arms; the simulated robots follow in real-time
View dataset metadata, frame data, and statistics:
python -m scripts.dataset inspect \
--dataset_root Datasets/record/001Additional Parameters:
--show_frames N: Display first N frames of sample data--show_stats: Display detailed statistical information for numeric columns
Read and analyze dataset observation/action data:
python -m scripts.dataset read \
--dataset_root Datasets/record/001 \
--num_frames 10Additional Parameters:
--num_frames N: Number of frames to display--episode N: Read specific episode index--output_csv PATH: Export data to CSV file--show_stats: Display statistical information
Add end-effector pose to existing datasets offline (computes both observation.ee_pose and action.ee_pose):
python -m scripts.dataset augment \
--dataset_root Datasets/record/001 \
--urdf_path Assets/robots/so101_new_calib.urdf \
--state_unit rad \
--overwriteParameters:
--urdf_path: Robot URDF file path--state_unit: Joint angle unit (radordeg)--overwrite: Overwrite existing EE pose data if present
Output:
- Single-arm: 8 dimensions (position xyz + quaternion xyzw + gripper)
- Dual-arm: 16 dimensions (left arm 8D + right arm 8D)
Add pointcloud to existing datasets offline (result in XYZRGB):
python scripts/utils/process_parquet_to_pc.py \
--dataset_root Datasets/record/001 \
--num_points 4096Parameters:
--dataset_root: Dataset root path--num_points: number of points of the pointcloud
Output:
- Create
pointcloudsfolder which contains all the episode folders,episode_000folder stores pointcloud for each frame, likeframe_000000.npz
Replay recorded datasets for visualization, verification, or data augmentation. Supports joint angle actions or end-effector pose control (via IK).
Example Command:
python -m scripts.dataset_sim replay \
--dataset_root Datasets/example/pant_long_merged \
--num_replays 1 \
--disable_depth \
--device "cpu" \
--enable_camerasParameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
--task |
str | LeHome-BiSO101-Direct-Garment-v2 |
Task environment name |
--dataset_root |
str | Datasets/record/001 |
Input dataset directory |
--output_root |
str | None |
Output directory (Should be specified (e.g, Datasets/replay_stored/example/record_top_long_release_10) if --save_successful_only is true) |
--num_replays |
int | 1 |
Number of replays per episode |
--save_successful_only |
flag | False |
Only save episodes that achieve success |
--start_episode |
int | 0 |
Starting episode index (inclusive) |
--end_episode |
int | None |
Ending episode index (exclusive, None = all episodes) |
--step_hz |
int | 60 |
Environment stepping rate in Hz |
--use_random_seed |
flag | False |
Use random seed (no fixed seed) |
--seed |
int | 42 |
Random seed for environment (ignored if --use_random_seed is set) |
--task_description |
str | fold the garment on the table |
Task description string |
--garment_cfg_base_path |
str | Assets/objects/Challenge_Garment |
Base path of the garment configuration |
--particle_cfg_path |
str | source/lehome/lehome/tasks/bedroom/config_file/particle_garment_cfg.yaml |
Path of the particle configuration |
--disable_depth |
flag | False |
Disable depth observation (faster replay) |
--use_ee_pose |
flag | False |
Use action.ee_pose control (Cartesian space, converted via IK) |
--ee_urdf_path |
str | Assets/robots/so101_new_calib.urdf |
URDF file path (required for --use_ee_pose) |
--ee_state_unit |
str | rad |
Joint angle unit: rad or deg |
--device |
str | cuda:0 |
Simulation device (cpu or cuda:0). Use cpu to avoid garment physics issues |
Notes:
- Pure Replay Mode: Omit
--output_rootto replay without saving (visualization only) - Data Augmentation: Use
--num_replays Nwith--save_successful_onlyto extract successful trajectories - End-Effector Pose Control: Requires dataset with
action.ee_posefield (recorded with--record_ee_poseor augmented usingaugment_ee_pose) - IK Statistics: When using
--use_ee_pose, IK success rate and error statistics are displayed
Remove specific features from datasets (e.g., depth maps to reduce storage):
Remove a single feature:
lerobot-edit-dataset \
--repo_id record_top_long_release_10/001 \
--root Datasets/record/record_top_long_release_10/001 \
--new_repo_id record_top_long_release_10/001_no_depth \
--operation.type remove_feature \
--operation.feature_names "['observation.top_depth']"Batch remove features from multiple datasets:
cd Datasets/record/record_top_long_release_10
for i in {001..010}; do
lerobot-edit-dataset \
--repo_id record_top_long_release_10/$i \
--root Datasets/record/record_top_long_release_10/$i \
--new_repo_id record_top_long_release_10/${i}_no_depth \
--operation.type remove_feature \
--operation.feature_names "['observation.top_depth']"
doneParameters:
--repo_id: Dataset identifier. Used as a label in metadata and for downloading from HuggingFace Hub if local files are not found. When--rootis specified, it doesn't affect the actual file path.--root: Dataset root directory path (relative or absolute). This determines where the dataset files are actually located.--new_repo_id: New dataset identifier (optional, if omitted, original dataset is renamed to_old)--operation.type: Operation type, useremove_feature--operation.feature_names: List of feature names to remove (Python list format)
Notes:
- Cannot remove required features:
timestamp,frame_index,episode_index,index,task_index - If
--new_repo_idis specified, original dataset remains unchanged - Multiple features can be removed:
"['feature1', 'feature2']"
Merge multiple datasets collected from different sessions into a single unified dataset.
Using Python script:
from pathlib import Path
from lerobot.datasets.lerobot_dataset import LeRobotDataset
from lerobot.datasets.dataset_tools import merge_datasets
from lerobot.utils.utils import init_logging
init_logging()
# Specify paths to datasets you want to merge
dataset_paths = [
Path("Datasets/record/record_top_long_release_10_no_depth_all/001_no_depth"),
Path("Datasets/record/record_top_long_release_10_no_depth_all/002_no_depth"),
Path("Datasets/record/record_top_long_release_10_no_depth_all/003_no_depth"),
# Add more dataset paths as needed
]
# Output configuration
output_dir = Path("Datasets/record/top_long_merged")
output_repo_id = "top_long_merged"
# Load datasets
datasets = [
LeRobotDataset(f"ds_{i:03d}", root=p)
for i, p in enumerate(dataset_paths, 1)
]
# Merge datasets
merged = merge_datasets(datasets, output_repo_id, output_dir)
print(f"✓ Merged! Episodes: {merged.meta.total_episodes}, Frames: {merged.meta.total_frames}")One-liner command:
python -c "
from pathlib import Path
from lerobot.datasets.lerobot_dataset import LeRobotDataset
from lerobot.datasets.dataset_tools import merge_datasets
from lerobot.utils.utils import init_logging
init_logging()
# Specify dataset paths to merge
dataset_paths = [
Path('Datasets/record/record_top_long_release_10_no_depth_all/001_no_depth'),
Path('Datasets/record/record_top_long_release_10_no_depth_all/002_no_depth'),
Path('Datasets/record/record_top_long_release_10_no_depth_all/003_no_depth'),
]
output_dir = Path('Datasets/record/top_long_merged')
datasets = [LeRobotDataset(f'ds_{i:03d}', root=p) for i, p in enumerate(dataset_paths, 1)]
merged = merge_datasets(datasets, 'top_long_merged', output_dir)
print(f'✓ Merged! Episodes: {merged.meta.total_episodes}, Frames: {merged.meta.total_frames}')
"Notes:
- All source datasets must have identical feature structures (except for removed features)
- Merged dataset contains all episodes from source datasets in sequential order
- Video files are automatically split into multiple files based on size limit (default: 200MB)
- Data files are automatically split into multiple files based on size limit (default: 100MB)
Recorded data is saved in LeRobot dataset format with the following structure:
Datasets/record/001/
├── data/
│ └── chunk-000/
│ ├── file-000.parquet
│ └── file-001.parquet
├── images/
│ ├── observation.images.top_rgb
│ ├── observation.images.left_rgb
│ └── observation.images.right_rgb
├── meta/
│ ├── episodes/
│ │ └── chunk-000/
│ │ └── file-000.parquet
│ ├── garment_info.json
│ ├── info.json
│ ├── stats.json
│ └── tasks.parquet
├── videos/
│ ├── observation.images.top_rgb
│ ├── observation.images.left_rgb
│ └── observation.images.right_rgb
└── pointclouds/
├── episode_000
│ ├── frame_000000.npz
│ ├── frame_000001.npz
│ └── frame_000002.npz
└── episode_001
- observation.state: Joint positions (6D for single-arm, 12D for dual-arm)
- action: Joint actions (same dimension as state)
- observation.images.{camera}: RGB images from different camera views
- observation.top_depth: Depth map from top camera (if enabled)
- observation.ee_pose: End-effector pose (if
--record_ee_poseenabled) - task: Task description string
The meta/ directory contains several metadata files:
-
info.json: Dataset metadata including total episodes, total frames, feature definitions, and dataset configuration -
garment_info.json: Custom metadata file containing initial pose information for each episode:{ "Top_Long_Unseen_0": { "0": { "object_initial_pose": [...], "scale": [...], "garment_name": "Top_Long_Unseen_0" }, "1": {...} } } -
stats.json: Statistical information about the dataset (e.g., mean, std for numeric features) -
tasks.parquet: Task descriptions for each episode in Parquet format -
episodes/: Episode-level metadata stored in Parquet format, organized by chunks
- Consistent Task Description: Use the same
--task_descriptionfor similar tasks to enable better filtering during training - Episode Quality: Use the D key to discard low-quality episodes during recording
- Recording Frequency: Use
--step_hz 120for smooth recordings. Lower values may miss fast motions - Dataset Organization: Record different garment types or tasks in separate dataset directories, then merge them later
- Backup: Always backup your datasets before merging or processing