Skip to content
/ ALOcc Public

[ICCV 2025] ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow Prediction

License

Notifications You must be signed in to change notification settings

cdb342/ALOcc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

28 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ“ข Announcement: ALOcc is now integrated into OccStudio!

ALOcc has been merged into OccStudio.

This repository serves as the official archive for the original ICCV 2025 paper implementation. For the latest updates, bug fixes, and a more unified framework supporting multiple models, we highly recommend using OccStudio.

๐Ÿ‘‰ Check out the new framework: https://github.com/cdb342/OccStudio


ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow Prediction

arXiv CVPR 2024 License Python PyTorch

ALOcc is a state-of-the-art, vision-only framework for dense 3D scene understanding. It transforms multi-camera 2D images into rich, spatiotemporal 3D representations, jointly predicting semantic occupancy grids and per-voxel motion flow. Our purely convolutional design achieves top-tier performance while offering a spectrum of models that balance accuracy and real-time efficiency, making it ideal for autonomous systems.



๐Ÿš€ Get Started

1. Installation

We recommend managing the environment with Conda.

# Clone this repository
git clone https://github.com/cdb342/ALOcc.git
cd ALOcc

# Create and activate the conda environment
conda create -n alocc python=3.8 -y
conda activate alocc

# Install PyTorch (example for CUDA 11.8, adjust if needed)
pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 --extra-index-url https://download.pytorch.org/whl/cu118

# Install MMCV (requires building C++ ops)
# Note: Using the stable 1.x branch for compatibility
git clone https://github.com/open-mmlab/mmcv.git
cd mmcv
git checkout 1.x
MMCV_WITH_OPS=1 pip install -e . -v
cd ..

# Install MMDetection and MMSegmentation
pip install mmdet==2.28.2 mmsegmentation==0.30.0

# Install the ALOcc framework in editable mode
pip install -v -e .

# Install remaining dependencies
pip install torchmetrics timm dcnv4 ninja spconv transformers IPython einops
pip install numpy==1.23.4 # Pin numpy version to avoid potential issues

2. Data Preparation

nuScenes Dataset

  1. Download the full nuScenes dataset from the official website.
  2. Download the primary Occ3D-nuScenes annotations from the project page.
  3. (Optional) For extended experiments, download other community annotations:

Please organize your data following this directory structure:

ALOcc/
โ”œโ”€โ”€ data/
โ”‚   โ”œโ”€โ”€ nuscenes/
โ”‚   โ”‚   โ”œโ”€โ”€ maps/
โ”‚   โ”‚   โ”œโ”€โ”€ samples/
โ”‚   โ”‚   โ”œโ”€โ”€ sweeps/
โ”‚   โ”‚   โ”œโ”€โ”€ v1.0-test/
โ”‚   โ”‚   โ”œโ”€โ”€ v1.0-trainval/
โ”‚   โ”‚   โ”œโ”€โ”€ gts/                 # Main Occ3D annotations
โ”‚   โ”‚   โ”œโ”€โ”€ gts_surroundocc/     # (Optional) SurroundOcc annotations
โ”‚   โ”‚   โ”œโ”€โ”€ openocc_v2/          # (Optional) OpenOcc annotations
โ”‚   โ”‚   โ”œโ”€โ”€ openocc_v2_ray_mask/ # (Optional) OpenOcc ray mask
โ”‚   โ”‚   โ””โ”€โ”€ nuScenes-Occupancy-v0.1/ # (Optional) OpenOccupancy annotations
...

Finally, run the preprocessing scripts to prepare the data for training:

# 1. Extract semantic segmentation labels from LiDAR
python tools/nusc_process/extract_sem_point.py

# 2. Create formatted info files for the dataloader
PYTHONPATH=$(pwd):$PYTHONPATH python tools/create_data_bevdet.py

Alternatively, you can download the pre-processed segmentation labels, train.pkl and val.pkl files from our Hugging Face Hub, and organize their path as:

ALOcc/
โ”œโ”€โ”€ data/
โ”‚   โ”œโ”€โ”€ lidar_seg
โ”‚   โ”œโ”€โ”€ nuscenes/
โ”‚   โ”‚   โ”œโ”€โ”€ train.pkl
โ”‚   โ”‚   โ”œโ”€โ”€ val.pkl
โ”‚   โ”‚   ...
...

3. Pre-trained Models

For training, please download pre-trained image backbones from BEVDet, GeoMIM, or our Hugging Face Hub. Place the checkpoint files in the ckpts/pretrain/ directory.


๐ŸŽฎ Train & Evaluate

Training

Use the provided script for distributed training on multiple GPUs.

# Syntax: bash tools/dist_train.sh [CONFIG_FILE] [WORK_DIR] [NUM_GPUS]

# Example: Train the ALOcc-3D model with 8 GPUs
bash tools/dist_train.sh configs/alocc/alocc_3d_256x704_bevdet_preatrain.py work_dirs/alocc_3d 8

Testing

Download our official pre-trained models from the ALOcc Hugging Face Hub and place them in the ckpts/ directory.

# Evaluate semantic occupancy (mIoU) or occupancy flow
# Syntax: bash tools/dist_test.sh [CONFIG_FILE] [CHECKPOINT_PATH] [NUM_GPUS]

# Example: Evaluate the pre-trained ALOcc-3D model
bash tools/dist_test.sh configs/alocc/alocc_3d_256x704_bevdet_preatrain.py ckpts/alocc_3d.pth 8

# Evaluate semantic occupancy (RayIoU metric)
# Syntax: bash tools/dist_test_ray.sh [CONFIG_FILE] [CHECKPOINT_PATH] [NUM_GPUS]

# Example: Evaluate ALOcc-3D with the RayIoU script
bash tools/dist_test_ray.sh configs/alocc/alocc_3d_256x704_bevdet_preatrain_wo_mask.py ckpts/alocc_3d_wo_mask.pth 8

โš ๏ธ Important Note: When running inference with temporal fusion enabled, please use exactly 1 or 8 GPUs. Using a different number of GPUs may lead to incorrect results due to a sampler bug causing duplicate sample processing.

Benchmarking

We provide convenient tools to benchmark model latency (FPS) and computational cost (FLOPs).

# Benchmark FPS (Frames Per Second)
# Syntax: python tools/analysis_tools/benchmark.py [CONFIG_FILE]
python tools/analysis_tools/benchmark.py configs/alocc/alocc_3d_256x704_bevdet_preatrain.py

# Calculate FLOPs
# Syntax: python tools/analysis_tools/get_flops.py [CONFIG_FILE] --shape [HEIGHT] [WIDTH]
python tools/analysis_tools/get_flops.py configs/alocc/alocc_3d_256x704_bevdet_preatrain.py --shape 256 704

Visualization

First, ensure you have Mayavi installed. You can install it using pip:

pip install mayavi

Before you can visualize the output, you need to run the model on the test set and save the prediction results.

Use the dist_test.sh script with the --save flag. This will store the model's output in a directory.

# Example: Evaluate the ALOcc-3D model and save the predictions
bash tools/dist_test.sh configs/alocc/alocc_3d_256x704_bevdet_preatrain.py ckpts/alocc_3d_256x704_bevdet_preatrain.pth 8 --save

The prediction results will be saved in the test/ directory, following a path structure like: test/[CONFIG_NAME]/[TIMESTAMP]/.

Once the predictions are saved, you can run the visualization script. This script requires the path to the prediction results and the path to the ground truth data.

# Syntax: python tools/visual.py [PREDICTION_PATH] [GROUND_TRUTH_PATH]
# Example:
python tools/visual.py work_dirs/alocc_3d_256x704_bevdet_preatrain/xxxxxxxx_xxxxxx/ your/path/to/ground_truth
  • Replace work_dirs/alocc_3d_256x704_bevdet_preatrain/xxxxxxxx_xxxxxx/ with the actual path to your saved prediction results from Step 2.
  • Replace your/path/to/ground_truth with the path to the corresponding ground truth dataset.

This will launch an interactive Mayavi window where you can inspect and compare the 3D occupancy predictions.

๐Ÿ“Š Results & Model Zoo

๐Ÿ† Performance on Occ3D-nuScenes (trained with camera visible mask)
Model Backbone Input Size mIoUDm mIoUm FPS Config Weights
ALOcc-2D-mini R-50 256 ร— 704 35.4 41.4 30.5 config HF Hub
ALOcc-2D R-50 256 ร— 704 38.7 44.8 8.2 config HF Hub
ALOcc-3D R-50 256 ร— 704 39.3 45.5 6.0 config HF Hub
๐Ÿ† Performance on Occ3D-nuScenes (trained w/o camera visible mask)
Model Backbone Input Size mIoU RayIoU RayIoU1m, 2m, 4m FPS Config Weights
ALOcc-2D-mini R-50 256 ร— 704 33.4 39.3 32.9, 40.1, 44.8 30.5 config HF Hub
ALOcc-2D R-50 256 ร— 704 37.4 43.0 37.1, 43.8, 48.2 8.2 config HF Hub
ALOcc-3D R-50 256 ร— 704 38.0 43.7 37.8, 44.7, 48.8 6.0 config HF Hub
๐Ÿ† Performance on OpenOcc (Semantic Occupancy and Flow)
Method Backbone Input Size Occ Score mAVE mAVETP RayIoU RayIoU1m, 2m, 4m FPS Config Weights
ALOcc-Flow-2D R-50 256 ร— 704 41.9 0.530 0.431 40.3 34.3, 41.0, 45.5 7.0 config HF Hub
ALOcc-Flow-3D R-50 256 ร— 704 43.1 0.549 0.458 41.9 35.6, 42.9, 47.2 5.5 config HF Hub

For more detailed results and ablations, please refer to our paper.


๐Ÿ™ Acknowledgement

This project is built upon the excellent foundation of several open-source projects. We extend our sincere gratitude to their authors and contributors.


๐Ÿ“œ Citation

If you find ALOcc useful for your research or applications, please consider citing our paper:

@InProceedings{chen2025alocc,
    author    = {Chen, Dubing and Fang, Jin and Han, Wencheng and Cheng, Xinjing and Yin, Junbo and Xu, Chenzhong and Khan, Fahad Shahbaz and Shen, Jianbing},
    title     = {Alocc: adaptive lifting-based 3d semantic occupancy and cost volume-based flow prediction},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2025},
}

@article{chen2024adaocc,
  title={AdaOcc: Adaptive Forward View Transformation and Flow Modeling for 3D Occupancy and Flow Prediction},
  author={Chen, Dubing and Han, Wencheng and Fang, Jin and Shen, Jianbing},
  journal={arXiv preprint arXiv:2407.01436},
  year={2024}
}

๐Ÿ”ผ Back to Top

About

[ICCV 2025] ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow Prediction

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages