Skip to content

Commit 702e097

Browse files
authored
Merge pull request #38 from BitLegion/dev/agentic-benchmarking
One shot the benchmarking suite
2 parents 10e6a42 + 4b68db8 commit 702e097

File tree

4 files changed

+811
-14
lines changed

4 files changed

+811
-14
lines changed
Lines changed: 161 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,161 @@
1+
# RoboDM Agentic Benchmarking
2+
3+
This directory contains benchmarking tools for the RoboDM Agentic framework, specifically designed to test performance with large-scale robotics datasets.
4+
5+
## DROID Dataset Benchmark
6+
7+
The `droid_benchmark.py` script benchmarks the performance of robodm-agentic with the DROID (Distributed Robotics Open-source Intelligence Dataset) from Google Research.
8+
9+
### Features
10+
11+
- **Parallel Ingestion**: Uses ThreadPoolExecutor to ingest trajectories in parallel
12+
- **Real Model Integration**: Uses actual LLM/VLM models (no mocks)
13+
- **Comprehensive Metrics**: Measures ingestion time, query performance, batch processing
14+
- **Detailed Reporting**: Generates both text reports and JSON metrics
15+
- **Tool Calling**: Uses the new tool calling system instead of code generation
16+
17+
### Prerequisites
18+
19+
1. **Install Dependencies**:
20+
```bash
21+
pip install tensorflow tensorflow-datasets numpy
22+
```
23+
24+
2. **Install Ollama** (for local model inference):
25+
```bash
26+
# Install ollama from https://ollama.ai
27+
ollama pull qwen2.5:7b
28+
ollama pull llava:7b
29+
```
30+
31+
3. **Alternative: OpenAI** (if you prefer cloud models):
32+
```bash
33+
export OPENAI_API_KEY="your-api-key-here"
34+
```
35+
36+
### Usage
37+
38+
#### Basic Usage
39+
```bash
40+
# Run with default settings (1000 trajectories, 4 workers)
41+
python robodm_agentic/benchmarking/droid_benchmark.py
42+
```
43+
44+
#### Custom Configuration
45+
```bash
46+
# Run with custom parameters
47+
python robodm_agentic/benchmarking/droid_benchmark.py \
48+
--num-trajectories 500 \
49+
--output-dir ./benchmark_results \
50+
--max-workers 8
51+
```
52+
53+
#### Test Setup First
54+
```bash
55+
# Test the setup without requiring tensorflow
56+
python robodm_agentic/benchmarking/test_benchmark.py
57+
```
58+
59+
### Command Line Options
60+
61+
- `--num-trajectories`: Number of DROID trajectories to ingest (default: 1000)
62+
- `--output-dir`: Directory to save trajectories and reports (default: temp directory)
63+
- `--max-workers`: Number of parallel workers for ingestion (default: 4)
64+
65+
### Output
66+
67+
The benchmark generates:
68+
69+
1. **Trajectory Files**: Converted DROID trajectories in RoboDM format (`.vla` files)
70+
2. **Benchmark Report**: `benchmark_report.txt` with comprehensive performance analysis
71+
3. **Metrics JSON**: `benchmark_metrics.json` with detailed timing and success data
72+
73+
### Performance Metrics
74+
75+
The benchmark measures:
76+
77+
- **Ingestion Performance**:
78+
- Total ingestion time
79+
- Average time per trajectory
80+
- Parallel processing efficiency
81+
- Data size statistics
82+
83+
- **Query Performance**:
84+
- Individual query response times
85+
- Batch query throughput
86+
- Success rates
87+
- Vision analysis performance
88+
89+
- **System Performance**:
90+
- Memory usage
91+
- CPU utilization
92+
- Model inference latency
93+
94+
### Example Output
95+
96+
```
97+
================================================================================
98+
DROID DATASET BENCHMARK REPORT
99+
================================================================================
100+
101+
INGESTION METRICS:
102+
Total trajectories: 1000
103+
Total ingestion time: 45.23s
104+
Average ingestion time per trajectory: 0.045s
105+
Average trajectory size: 2.34MB
106+
Total data size: 2340.00MB
107+
Parallel workers: 4
108+
109+
QUERY PERFORMANCE METRICS:
110+
Total queries: 15
111+
Total query time: 67.89s
112+
Average query time: 4.53s
113+
Median query time: 3.21s
114+
Min query time: 1.23s
115+
Max query time: 12.45s
116+
117+
BATCH PROCESSING METRICS:
118+
Batch queries: 5
119+
Batch total time: 18.76s
120+
Average time per query (batch): 3.75s
121+
Throughput: 0.27 queries/second
122+
123+
QUERY SUCCESS RATES:
124+
Successful queries: 14/15
125+
Success rate: 93.3%
126+
```
127+
128+
### Troubleshooting
129+
130+
1. **TensorFlow Import Errors**: Install tensorflow and tensorflow-datasets
131+
2. **Ollama Connection Issues**: Ensure ollama is running and models are downloaded
132+
3. **Memory Issues**: Reduce `--num-trajectories` or `--max-workers`
133+
4. **Slow Performance**: Increase `--max-workers` for faster ingestion
134+
135+
### Integration with RoboDM Features
136+
137+
The benchmark leverages RoboDM's optimization features:
138+
139+
- **Compression**: Uses libx264 video codec for efficient storage
140+
- **Parallel Loading**: Tests RoboDM's parallel trajectory loading
141+
- **Frame Selection**: Tests intelligent frame selection for vision queries
142+
- **Metadata Access**: Tests fast metadata retrieval
143+
144+
### Extending the Benchmark
145+
146+
To add new benchmark scenarios:
147+
148+
1. Add new queries to `self.benchmark_queries` in the `DROIDBenchmark` class
149+
2. Implement new metrics in the `metrics` dictionary
150+
3. Add custom analysis in the `generate_report` method
151+
4. Create new benchmark classes for different datasets
152+
153+
### Performance Recommendations
154+
155+
Based on benchmark results, the system provides recommendations for:
156+
157+
- **High Query Times (>5s)**: Implement caching, batch processing, frame selection optimization
158+
- **Moderate Query Times (>2s)**: Consider frame selection optimization and query result caching
159+
- **Good Performance (<2s)**: System is performing well
160+
161+
This benchmark helps identify bottlenecks and optimize the robodm-agentic framework for production use with large-scale robotics datasets.

0 commit comments

Comments
 (0)