|
| 1 | +# RoboDM Agentic Benchmarking |
| 2 | + |
| 3 | +This directory contains benchmarking tools for the RoboDM Agentic framework, specifically designed to test performance with large-scale robotics datasets. |
| 4 | + |
| 5 | +## DROID Dataset Benchmark |
| 6 | + |
| 7 | +The `droid_benchmark.py` script benchmarks the performance of robodm-agentic with the DROID (Distributed Robotics Open-source Intelligence Dataset) from Google Research. |
| 8 | + |
| 9 | +### Features |
| 10 | + |
| 11 | +- **Parallel Ingestion**: Uses ThreadPoolExecutor to ingest trajectories in parallel |
| 12 | +- **Real Model Integration**: Uses actual LLM/VLM models (no mocks) |
| 13 | +- **Comprehensive Metrics**: Measures ingestion time, query performance, batch processing |
| 14 | +- **Detailed Reporting**: Generates both text reports and JSON metrics |
| 15 | +- **Tool Calling**: Uses the new tool calling system instead of code generation |
| 16 | + |
| 17 | +### Prerequisites |
| 18 | + |
| 19 | +1. **Install Dependencies**: |
| 20 | + ```bash |
| 21 | + pip install tensorflow tensorflow-datasets numpy |
| 22 | + ``` |
| 23 | + |
| 24 | +2. **Install Ollama** (for local model inference): |
| 25 | + ```bash |
| 26 | + # Install ollama from https://ollama.ai |
| 27 | + ollama pull qwen2.5:7b |
| 28 | + ollama pull llava:7b |
| 29 | + ``` |
| 30 | + |
| 31 | +3. **Alternative: OpenAI** (if you prefer cloud models): |
| 32 | + ```bash |
| 33 | + export OPENAI_API_KEY="your-api-key-here" |
| 34 | + ``` |
| 35 | + |
| 36 | +### Usage |
| 37 | + |
| 38 | +#### Basic Usage |
| 39 | +```bash |
| 40 | +# Run with default settings (1000 trajectories, 4 workers) |
| 41 | +python robodm_agentic/benchmarking/droid_benchmark.py |
| 42 | +``` |
| 43 | + |
| 44 | +#### Custom Configuration |
| 45 | +```bash |
| 46 | +# Run with custom parameters |
| 47 | +python robodm_agentic/benchmarking/droid_benchmark.py \ |
| 48 | + --num-trajectories 500 \ |
| 49 | + --output-dir ./benchmark_results \ |
| 50 | + --max-workers 8 |
| 51 | +``` |
| 52 | + |
| 53 | +#### Test Setup First |
| 54 | +```bash |
| 55 | +# Test the setup without requiring tensorflow |
| 56 | +python robodm_agentic/benchmarking/test_benchmark.py |
| 57 | +``` |
| 58 | + |
| 59 | +### Command Line Options |
| 60 | + |
| 61 | +- `--num-trajectories`: Number of DROID trajectories to ingest (default: 1000) |
| 62 | +- `--output-dir`: Directory to save trajectories and reports (default: temp directory) |
| 63 | +- `--max-workers`: Number of parallel workers for ingestion (default: 4) |
| 64 | + |
| 65 | +### Output |
| 66 | + |
| 67 | +The benchmark generates: |
| 68 | + |
| 69 | +1. **Trajectory Files**: Converted DROID trajectories in RoboDM format (`.vla` files) |
| 70 | +2. **Benchmark Report**: `benchmark_report.txt` with comprehensive performance analysis |
| 71 | +3. **Metrics JSON**: `benchmark_metrics.json` with detailed timing and success data |
| 72 | + |
| 73 | +### Performance Metrics |
| 74 | + |
| 75 | +The benchmark measures: |
| 76 | + |
| 77 | +- **Ingestion Performance**: |
| 78 | + - Total ingestion time |
| 79 | + - Average time per trajectory |
| 80 | + - Parallel processing efficiency |
| 81 | + - Data size statistics |
| 82 | + |
| 83 | +- **Query Performance**: |
| 84 | + - Individual query response times |
| 85 | + - Batch query throughput |
| 86 | + - Success rates |
| 87 | + - Vision analysis performance |
| 88 | + |
| 89 | +- **System Performance**: |
| 90 | + - Memory usage |
| 91 | + - CPU utilization |
| 92 | + - Model inference latency |
| 93 | + |
| 94 | +### Example Output |
| 95 | + |
| 96 | +``` |
| 97 | +================================================================================ |
| 98 | +DROID DATASET BENCHMARK REPORT |
| 99 | +================================================================================ |
| 100 | +
|
| 101 | +INGESTION METRICS: |
| 102 | + Total trajectories: 1000 |
| 103 | + Total ingestion time: 45.23s |
| 104 | + Average ingestion time per trajectory: 0.045s |
| 105 | + Average trajectory size: 2.34MB |
| 106 | + Total data size: 2340.00MB |
| 107 | + Parallel workers: 4 |
| 108 | +
|
| 109 | +QUERY PERFORMANCE METRICS: |
| 110 | + Total queries: 15 |
| 111 | + Total query time: 67.89s |
| 112 | + Average query time: 4.53s |
| 113 | + Median query time: 3.21s |
| 114 | + Min query time: 1.23s |
| 115 | + Max query time: 12.45s |
| 116 | +
|
| 117 | +BATCH PROCESSING METRICS: |
| 118 | + Batch queries: 5 |
| 119 | + Batch total time: 18.76s |
| 120 | + Average time per query (batch): 3.75s |
| 121 | + Throughput: 0.27 queries/second |
| 122 | +
|
| 123 | +QUERY SUCCESS RATES: |
| 124 | + Successful queries: 14/15 |
| 125 | + Success rate: 93.3% |
| 126 | +``` |
| 127 | + |
| 128 | +### Troubleshooting |
| 129 | + |
| 130 | +1. **TensorFlow Import Errors**: Install tensorflow and tensorflow-datasets |
| 131 | +2. **Ollama Connection Issues**: Ensure ollama is running and models are downloaded |
| 132 | +3. **Memory Issues**: Reduce `--num-trajectories` or `--max-workers` |
| 133 | +4. **Slow Performance**: Increase `--max-workers` for faster ingestion |
| 134 | + |
| 135 | +### Integration with RoboDM Features |
| 136 | + |
| 137 | +The benchmark leverages RoboDM's optimization features: |
| 138 | + |
| 139 | +- **Compression**: Uses libx264 video codec for efficient storage |
| 140 | +- **Parallel Loading**: Tests RoboDM's parallel trajectory loading |
| 141 | +- **Frame Selection**: Tests intelligent frame selection for vision queries |
| 142 | +- **Metadata Access**: Tests fast metadata retrieval |
| 143 | + |
| 144 | +### Extending the Benchmark |
| 145 | + |
| 146 | +To add new benchmark scenarios: |
| 147 | + |
| 148 | +1. Add new queries to `self.benchmark_queries` in the `DROIDBenchmark` class |
| 149 | +2. Implement new metrics in the `metrics` dictionary |
| 150 | +3. Add custom analysis in the `generate_report` method |
| 151 | +4. Create new benchmark classes for different datasets |
| 152 | + |
| 153 | +### Performance Recommendations |
| 154 | + |
| 155 | +Based on benchmark results, the system provides recommendations for: |
| 156 | + |
| 157 | +- **High Query Times (>5s)**: Implement caching, batch processing, frame selection optimization |
| 158 | +- **Moderate Query Times (>2s)**: Consider frame selection optimization and query result caching |
| 159 | +- **Good Performance (<2s)**: System is performing well |
| 160 | + |
| 161 | +This benchmark helps identify bottlenecks and optimize the robodm-agentic framework for production use with large-scale robotics datasets. |
0 commit comments