Saimouli Katragadda1, Cho-Ying Wu2, Yuliang Guo2†, Xinyu Huang2, Guoquan Huang1, Liu Ren2
1University of Delaware 2Bosch Research North America † Project Lead
Webpage | Paper | Video | Pretrained Models
- Dense and sharp CLIP embedding (192x192x768) beyond real-time speed, e.g., >40 FPS.
- A fully online system that seamlessly integrates dense CLIP features with Gaussian Splatting.
- Provide physical memory for real-time human–machine interaction.
We add running RGBL-disentanglement in the "lang_disent" branch. See the following section for instructions. Fix some errors in showing visualization.
mkdir -p data
cd data
wget https://huggingface.co/datasets/kxic/vMAP/resolve/main/vmap.zip
unzip vmap.zipgit clone https://github.com/rpng/online_lang_splatting.git --recursive
cd online_lang_splattingSetup the environment.
conda env create -f environment.yaml
conda activate LangGS💬 Language Model Setup
cd langauge/sed/open_clip
make install
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'Download language model weights from
https://drive.google.com/file/d/1zAXE0QXy47n0cVn7j_2cSR85eqxdDGg8/view?usp=drive_link
Edit language/configs/convnextL_768.yaml and Set the WEIGHTS to the path of the downloaded language model weights
cd online_lang_splatting
python create_lang_model.py --config language/configs/convnextL_768.yamlDownlod the pre-trained weights (HuggingFace link at the top), and place the models under "pretrained_models". We use omni_general indoor trained weights. You can download (if not already) using the following command:
hf download slamDev/OnlineLanguageSplatting --repo-type=dataset
To test language feature on your own image, run
python3 language/language_features.py --high-res-model "pretrained_models/omni_general/high_res_71_indoor.ckpt" --lang-model "seg_clip_model_l.pth" --input "sample/replica_room0.jpg" --query-text "vase"This will show up the feature map and heatmap for localization.
Edit paths in config/rgbd/replicav2/base_config.yaml.
auto_ckpt_pathto load generalized autoencoder;lang_model_pathto point to the language model weights (e.g., seg_clip_model_l.pth from the last step).hr_ckpt_pathto point to the high resolution module weights.
for room0.yaml edit dataset_path to point to the room0 dataset and online_ckpt_path to where you want the online AE checkpoint in 2-stage pipeline to be saved.
In base_config.yaml point auto_ckpt_path and hr_ckpt_path to the respective files and in room0.yaml set single_stage_ae to False.
To run the 1-stage pipeline, open room0.yaml and update the following parameters:
- Set
auto_ckpt_pathto the cross-data generalization checkpoint file. - Set
single_stage_aetoTrue.
We use a 4-split strategy for training:
- Split 1:
office0,room0 - Split 2:
office1,room1 - Split 3:
office2,room2 - Split 4:
office3,office4Training and Testing Example for 4-Split Strategy: - Run 1: Train on Splits 2, 3, 4 → Test on Split 1
- Run 2: Train on Splits 1, 3, 4 → Test on Split 2
- Run 3: Train on Splits 1, 2, 4 → Test on Split 3
- Run 4: Train on Splits 1, 2, 3 → Test on Split 4
The weights are in the pretrained weights folder. Use appropriate weights
Example: For evaluating on room0 and office0, use weights from Run 1.
Note: Edit the paths in base_config.yaml to specifiy autoencoder path and save dir path respectively.
python3 slam.py --config configs/rgbd/replicav2/room0.yamlgit checkout lang_disentcd submodules/diff-gaussian-rasterization-disentangle-optim
pip install -e .
cd ../..-
Follow the previous step to donwload the pretrained models. Edit the pretrained model path in configs/rgbd/replicav2/base_config.yaml, room0_disent.yaml and room0_disent_w_labels.yaml (Edit the dataset path and single-stage AE path.)
-
Run the single-stage AE with RGB-L disent
python3 slam.py --config configs/rgbd/replicav2/room0_disent.yaml- We also prepare high-resolution language feature labels from LangSplat. Edit the language label path and run
python3 slam.py --config configs/rgbd/replicav2/room0_disent_w_labels.yamlA GUI window will pop up and show the SLAM results
🔖 Create Labels
After runnin our training script (or LangSplat's training), our results are saved in a structure like <result_sequence>/psnr/before_opt/
The following script first convert VMAP's prepared Replica's segmentation maps to json format. See the argument help in create_replica_labels.py --help.
It will save json under our_result_dir/gt/label or langsplat_result_dir/gt/label.
python3 eval/create_replica_labels.py --langslam_dir <our_result_dir/psnr/before_opt> --langsplat_dir (optional) <langsplat_result_dir> --seg_file_config <path_to_render_config.yaml_of_a_seq>To evaluate 2 stage
python3 eval/evaluate_onlinelangslam.py --dataset_name <room0, room1, ...> --root_dir <our_result_dir> --ae_ckpt_dir <generalized_ae_path> --online_ae_ckpt <online_ae_path>To evaluate cross data genenarizable
python3 eval/evaluate_langslam.py --dataset_name <room0, room1, ...> --root_dir <our_result_dir> --ae_ckpt_dir <pretrained_single_stage_ae_path (see pretrained weights folder)> Prepare colorized GT by running
cd eval/tsdf_fusion
python3 save_semantic_colors_gt.pyTo reconstruct TSDF for groundtruth, run
python3 dim3_recon_gt.pycd PytorchEMD; python3 setup.pycopy the compiled .so file to the tsdf-fusion folder (move one level up)
python3 3d_evaluation_and_visualize_langslam_dim15.pyLangSplat
python3 3d_evaluation_and_visualize_langsplat.py🧪 Training
Language feature script can be used to save high or low resolution langauge features labels to train auto encoder on your own domain.
python3 language/autoencoder/train_encoder_light.pyThere might be minor differences between the released version and the results in the paper. Please bear in mind that multi-process performance has some randomness due to GPU utilisation. We run all our experiments on an RTX A4500 GPU, and the performance may differ when running with a different GPU.
This work incorporates many open-source codes. We extend our gratitude to the authors of the software.
- If you see error like LangSupervisedNet doesn't have attribute load_from_checkpoint in self.hr_model.load_from_checkpoint(...). This is the issue of Pytorch-lightning version. Edit those lines to LangSupervisedNet.load_from_checkpoint(...)
If you find this work helpful, please consider citing us:
@inproceedings{katragadda2025_onlinelang,
title = {{O}nline {L}anguage {S}platting},
author = {Saimouli Katragadda and Cho-Ying Wu and Yuliang Guo and Xinyu Huang and Guoquan Huang and Liu Ren},
booktitle = {ICCV},
year = {2025}
}

