-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Complete documentation for the YOLOv8 to Hailo-8 HEF Model Generation Pipeline.
- Getting Started
- Architecture Overview
- Step-by-Step Guide
- Advanced Topics
- Troubleshooting
- API Reference
- Best Practices
- FAQ
This pipeline converts raw camera images into optimized neural network models that run on the Hailo-8 AI accelerator. The end result is a .hef file that can perform real-time object detection on a Raspberry Pi 5 at 60-100 FPS.
# Clone the repository
git clone <repository-url>
cd hailo_model_generator
# Install dependencies
python3 -m venv .venv
source .venv/bin/activate
pip install -r step1_data_preparation/requirements.txt
# Add some images to captured_images/ folder
# Run the complete pipeline
./run_pipeline.sh datasets/ my_modelDevelopment Machine:
- OS: Ubuntu 20.04+ or similar Linux distribution
- CPU: x86_64 architecture (required for Hailo SDK)
- RAM: 8GB minimum, 16GB recommended
- GPU: NVIDIA GPU with CUDA support (optional, for faster training)
- Storage: 10GB free space
Deployment Target:
- Raspberry Pi 5
- Hailo-8 AI Accelerator Module
- Raspberry Pi Camera Module 3 or USB webcam
- Raspberry Pi OS (64-bit)
┌─────────────────────┐
│ Captured Images │ Raw camera images
│ (PNG/JPG) │
└──────────┬──────────┘
│ Step 1: Annotation
↓
┌─────────────────────┐
│ YOLO Dataset │ Images + labels
│ (train/val splits) │
└──────────┬──────────┘
│ Step 2: Training
↓
┌─────────────────────┐
│ PyTorch Model │ best.pt (~20MB)
│ (YOLOv8) │
└──────────┬──────────┘
│ Step 3: ONNX Export
↓
┌─────────────────────┐
│ ONNX Model │ With/without NMS
│ (opset 11) │
└──────────┬──────────┘
│ Step 4: HEF Compilation
↓
┌─────────────────────┐
│ HEF Binary │ Optimized for Hailo-8
│ (INT8 quantized) │ (~9MB)
└──────────┬──────────┘
│ Step 5: Deployment
↓
┌─────────────────────┐
│ Raspberry Pi 5 │ Real-time inference
│ + Hailo-8 │ 60-100 FPS
└─────────────────────┘
Training Phase (Development Machine):
- Raw images → YOLO annotations
- Annotated dataset → Trained PyTorch model
- PyTorch model → ONNX model
- ONNX model → HEF binary (with quantization)
Inference Phase (Raspberry Pi):
- Camera frame (RGB) → Preprocessing (resize, pad)
- UINT8 input → Hailo-8 inference
- Raw outputs → Python NMS (if needed)
- Filtered detections → Visualization/logging
- YOLOv8: State-of-the-art object detection architecture
- Hailo-8: Neural processing unit with 26 TOPS performance
- ONNX: Intermediate format for model portability
- HailoRT: Runtime library for inference on Hailo-8
- Dataflow Compiler: Hailo's tool for model optimization
Goal: Create a labeled dataset in YOLO format.
Place raw images in captured_images/ directory:
captured_images/
├── drone_001.jpg
├── drone_002.jpg
└── ...Best practices:
- Capture diverse scenes (different lighting, backgrounds, angles)
- Include images without objects (negative samples)
- Aim for 100+ images per class
- Use consistent resolution (640x640 recommended)
cd step1_data_preparation
python3 annotate_drones.py --dataset ../datasets/trainInteractive controls:
- Mouse: Click and drag to draw bounding boxes
- Keys 0-9: Select object class
- Space: Save current annotation and move to next image
- Backspace: Delete last box
- Escape: Exit without saving
Output format:
datasets/
├── train/
│ ├── images/
│ │ └── drone_001.jpg
│ └── labels/
│ └── drone_001.txt # YOLO format: class x y w h
└── data.yaml
python3 verify_annotations.py ../datasetsThis checks for:
- Missing label files
- Invalid coordinates (outside [0,1] range)
- Empty annotations
- Mismatched image/label counts
Goal: Train a YOLOv8 model on your annotated dataset.
Ensure absolute paths in datasets/data.yaml:
train: /home/user/hailo_model_generator/datasets/train/images
val: /home/user/hailo_model_generator/datasets/val/images
nc: 2
names:
0: drone
1: IR-Dronecd step2_training
python3 train_yolov8.py ../datasets/data.yaml --epochs 200 --batch 32Parameters explained:
-
--epochs: Number of training iterations (default: 200) -
--batch: Batch size (adjust based on GPU memory) -
--imgsz: Input image size (default: 640) -
--patience: Early stopping patience (default: 15) -
--name: Training run name (default: train_drone_ir)
Training outputs are saved to runs/detect/<name>/:
runs/detect/train_drone_ir/
├── weights/
│ ├── best.pt # Best model checkpoint
│ └── last.pt # Last epoch checkpoint
├── results.csv # Training metrics
├── confusion_matrix.png
├── PR_curve.png
└── results.png # Loss/mAP plots
Key metrics:
- mAP50: Mean average precision at 50% IoU (aim for >0.8)
- mAP50-95: Stricter metric (aim for >0.5)
- Box loss: Should decrease steadily
- Class loss: Should converge to low value
If accuracy is low:
- Add more training images (especially hard examples)
- Increase epochs
- Verify label quality
- Balance class distribution
If training is slow:
- Reduce batch size
- Use smaller model (YOLOv8n instead of YOLOv8s)
- Enable GPU acceleration (CUDA)
If overfitting occurs:
- Add more diverse images
- Enable data augmentation
- Reduce epochs
- Use early stopping
Goal: Convert PyTorch model to ONNX format for Hailo compilation.
Option A: Without NMS (Recommended)
cd step3_onnx_export
python3 export_onnx_for_hailo.py ../step2_training/runs/detect/train/weights/best.ptOutput: best.onnx with 6 raw output tensors. Python NMS required in inference.
Option B: With NMS Embedded
python3 export_onnx_for_hailo.py ../step2_training/runs/detect/train/weights/best.pt --nmsOutput: best_nms.onnx with NMS embedded. No Python NMS needed, but less flexible.
| Feature | Without NMS | With NMS |
|---|---|---|
| Output format | Raw predictions (6 tensors) | Final detections (1 tensor) |
| Python NMS needed | Yes | No |
| Flexibility | High (adjust thresholds at runtime) | Low (fixed at export) |
| HEF size | ~9MB | ~9MB |
| Compilation success | Always works | May fail for some models |
Recommendation: Start with nms=False (default), use Python NMS in inference script.
python3 verify_onnx_export.py best.onnxThis runs inference on a black image and checks for false positives.
Goal: Compile ONNX to Hailo Executable Format (HEF).
Download from Hailo Developer Zone:
- Get Hailo Dataflow Compiler SDK 3.33.0+
- Extract to
step4_hef_compilation/.venv_hailo_full/ - Verify installation:
source step4_hef_compilation/.venv_hailo_full/bin/activate
python -c "import hailo_sdk_client; print('OK')"
deactivateCalibration is used for INT8 quantization:
cd step4_hef_compilation
python3 prepare_calibration.py ../datasets --num-samples 64This creates calibration_data/ with 64 preprocessed images (UINT8, 640x640).
Important: Images must be:
- Same preprocessing as training (resize, letterbox)
- UINT8 format [0-255], NOT float32
- Representative of inference data
python3 compile_to_hef.py ../step3_onnx_export/best.onnx --output my_modelCompilation process:
- Parse ONNX (2-3 seconds)
- Optimize graph (10-15 seconds)
- Quantize to INT8 using calibration data (60-90 seconds)
- Compile for Hailo-8 architecture (30-60 seconds)
- Generate HEF binary
Output: models/my_model.hef (~9MB)
# Enable loose mode - suppresses shape warnings
enable_loose_mode
# Optimization level (0=fast, 2=best accuracy)
optimization_level(0)
# Batch size (1 for real-time inference)
batch_size(1)When to change:
- Set
optimization_level(2)for better accuracy (slower compile) - Disable
enable_loose_modeif you see runtime errors - Increase
batch_sizefor batch inference (not common on Pi)
Goal: Deploy HEF model to Raspberry Pi 5 and test inference.
On the Raspberry Pi:
# Install HailoRT runtime (not the SDK)
sudo apt update
sudo apt install python3-hailort
# Verify installation
python3 -c "import hailo_platform; print('HailoRT OK')"From your development machine:
cd step5_raspberry_pi_testing
./deploy_to_pi.shThe script will:
- Prompt for Pi IP address and credentials
- Copy HEF file and inference scripts
- Install dependencies
- Verify HailoRT installation
- Provide SSH command for testing
Manual password entry required for SSH/SCP operations.
SSH to the Pi and run:
cd ~/MODEL-GEN/scripts
python3 hailo_detect_live.py --model ../models/my_model.hef --headlessInference modes:
-
--headless: No display, save detections to files (recommended for SSH) -
--picamera: Use Raspberry Pi Camera Module -
--camera 0: Use USB webcam at /dev/video0 -
--conf 0.35: Confidence threshold -
--iou 0.5: NMS IoU threshold
# 1. Preprocess
image = cv2.resize(frame, (640, 640)) # Keep UINT8
rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# 2. Hailo inference
outputs = hailo_device.infer(rgb) # 6 output tensors
# 3. Decode outputs (DFL format)
boxes, scores, class_ids = decode_yolov8_outputs(outputs)
# 4. Apply Python NMS
boxes, scores, class_ids = nms(boxes, scores, class_ids)
# 5. Visualize or save
draw_boxes(frame, boxes, scores, class_ids)The pipeline supports both ONNX-embedded NMS and Python NMS:
ONNX model → Hailo-8 → Final detections (1, 300, 6)
↓
Direct use (no Python NMS)
ONNX model → Hailo-8 → Raw predictions (6 tensors)
↓
Python NMS → Final detections
Python NMS Script:
step5_raspberry_pi_testing/nms_postprocess.py can be used standalone:
python3 nms_postprocess.py --input raw_output.npy --output detections.npzTo add or modify classes:
- Update
data.yaml:
nc: 3 # Number of classes
names:
0: drone
1: IR-Drone
2: bird # New class- Re-annotate images with new class
- Retrain model
- Update inference script class names
For better accuracy:
- Use YOLOv8m or YOLOv8l (larger models)
- Increase training epochs
- Add more training data
- Use test-time augmentation
For better speed:
- Use YOLOv8n (nano model)
- Reduce input size to 320x320
- Set
optimization_level(0)in compilation
For smaller model size:
- Use YOLOv8n
- Reduce number of classes
- Prune unnecessary layers
For processing multiple images (not real-time):
# Load HEF with batch_size > 1
network_group_params.batch_size = 4
# Prepare batch
batch_images = np.stack([img1, img2, img3, img4])
# Infer
outputs = pipeline.infer({input_name: batch_images})Run multiple inference instances:
# Terminal 1
python3 hailo_detect_live.py --model model.hef --camera 0 --headless
# Terminal 2
python3 hailo_detect_live.py --model model.hef --camera 1 --headlessNote: Hailo-8 can handle multiple streams, but CPU/memory may be bottleneck.
Cause: HEF wasn't compiled or is in wrong directory.
Solution:
# Check if HEF exists
ls -lh step4_hef_compilation/*.hef
ls -lh models/*.hef
# Move if needed
mv step4_hef_compilation/*.hef models/Cause: SSH connection failed.
Solution:
- Verify Pi is powered on:
ping <pi_ip> - Check SSH is enabled:
sudo systemctl status ssh - Test manual SSH:
ssh user@<pi_ip> - Check firewall:
sudo ufw status
Cause: Deployment script needs sshpass for password handling.
Solution:
sudo apt-get install sshpassCause: Model exported without NMS or with wrong settings.
Solution:
# Re-export with --nms flag
python3 export_onnx_for_hailo.py best.pt --nms
# Or use default (no NMS) and apply Python NMS
python3 export_onnx_for_hailo.py best.ptCause: .venv_hailo_full not installed.
Solution:
- Download Hailo Dataflow Compiler from hailo.ai
- Extract to
step4_hef_compilation/.venv_hailo_full/ - Verify:
source .venv_hailo_full/bin/activate && python -c "import hailo_sdk_client"
Cause: Calibration images are float32 instead of UINT8.
Solution:
# Regenerate calibration data
rm -rf step4_hef_compilation/calibration_data
python3 step4_hef_compilation/prepare_calibration.py datasets --num-samples 64Causes and solutions:
- CPU bottleneck: Optimize preprocessing, use smaller image size
- USB camera latency: Use Picamera2 instead
-
Display overhead: Use
--headlessmode - Memory swap: Add more RAM or reduce batch size
Debug steps:
- Verify model accuracy in training metrics
- Test ONNX model with verify script
- Check preprocessing matches training
- Adjust confidence threshold:
--conf 0.2 - Verify NMS is applied (for non-NMS models)
from step2_training.train_yolov8 import train_yolov8
# Train with custom parameters
train_yolov8(
data_yaml='datasets/data.yaml',
epochs=200,
batch=32,
imgsz=640,
patience=15,
name='my_training_run'
)from step3_onnx_export.export_onnx_for_hailo import export_onnx_for_hailo
# Export without NMS
export_onnx_for_hailo(
pt_path='best.pt',
output_dir='exports',
imgsz=640,
nms=False # Python NMS required
)
# Export with NMS
export_onnx_for_hailo(
pt_path='best.pt',
output_dir='exports',
imgsz=640,
nms=True # No Python NMS needed
)from step4_hef_compilation.compile_to_hef import compile_onnx_to_hef
# Compile ONNX to HEF
hef_path = compile_onnx_to_hef(
onnx_path='best.onnx',
output_name='my_model',
calib_dir='calibration_data',
hailo_venv='.venv_hailo_full'
)from step5_raspberry_pi_testing.hailo_detect_live import HailoDetector
# Initialize detector
detector = HailoDetector(
hef_path='model.hef',
conf_thresh=0.25,
iou_thresh=0.45
)
# Run detection
boxes, scores, class_ids = detector.detect(frame)
# Cleanup
detector.cleanup()- Quality over quantity: 100 good annotations > 1000 poor ones
- Balance classes: Equal samples per class when possible
- Include edge cases: Occluded objects, different scales, lighting
- Negative samples: Images without objects prevent false positives
- Consistent annotation: Same person annotates all, or use guidelines
- Start small: Use YOLOv8n for quick iterations
- Monitor validation: Watch for overfitting
- Early stopping: Let patience=15 prevent wasted epochs
- Save checkpoints: Keep multiple model versions
- Document experiments: Track hyperparameters and results
- Verify ONNX: Always run verification script
- Test inference: Try ONNX model before HEF compilation
- Document NMS choice: Note whether model uses embedded or Python NMS
- Keep ONNX files: Don't delete after HEF compilation (for debugging)
- Test locally first: Run inference on development machine
- Profile performance: Measure FPS, latency, memory usage
- Log errors: Capture stdout/stderr for debugging
- Version models: Name HEF files with version/date
- Backup configurations: Save model configs and scripts
- Use SSH keys: Set up key-based authentication for Pi
- Update regularly: Keep Raspberry Pi OS and packages updated
- Restrict access: Firewall rules for Pi network access
- Secure models: Don't expose HEF files publicly (proprietary)
Q: Can I use this with other YOLO versions (YOLOv5, YOLOv10)?
A: Yes, but you'll need to modify export scripts. YOLOv8 is recommended for best Hailo compatibility.
Q: Does this work on Raspberry Pi 4?
A: No, Hailo-8 module requires Raspberry Pi 5 M.2 slot.
Q: Can I train on Windows?
A: Training works on Windows, but HEF compilation requires Linux (x86_64).
Q: How many images do I need for training?
A: Minimum 50 per class, recommended 100+, ideal 500+.
Q: What's the difference between Hailo SDK and HailoRT?
A: SDK is for development (model compilation), HailoRT is runtime library (inference only).
Q: Can I use custom backbones (ResNet, EfficientNet)?
A: Yes, but you'll need to export them to ONNX and ensure Hailo compatibility.
Q: How do I improve model accuracy?
A: More data, longer training, better annotations, or larger model (YOLOv8m/l).
Q: Can I deploy to other edge devices (Jetson, Coral)?
A: Not with HEF format. You'll need to export to TensorRT (Jetson) or TFLite (Coral).
Q: Is GPU required for training?
A: No, but highly recommended. GPU training is 10-50x faster than CPU.
Q: How do I update to a new model version?
A: Retrain with new data, export ONNX, compile new HEF, deploy to Pi.
Contributions are welcome! Please:
- Test changes thoroughly
- Update documentation
- Follow existing code style
- Add examples for new features
This project follows the parent project license terms.
Last Updated: December 11, 2025
Version: 1.0.0
Maintainer: arsatyants