Skip to content

rpng/online_lang_splatting

Repository files navigation

Online Language Splatting

Saimouli Katragadda1, Cho-Ying Wu2, Yuliang Guo2†, Xinyu Huang2, Guoquan Huang1, Liu Ren2
1University of Delaware     2Bosch Research North America      Project Lead
Webpage | Paper | Video | Pretrained Models

ICCV 2025

teaser

  • Dense and sharp CLIP embedding (192x192x768) beyond real-time speed, e.g., >40 FPS.
  • A fully online system that seamlessly integrates dense CLIP features with Gaussian Splatting.
  • Provide physical memory for real-time human–machine interaction.
Sofa Demo
Sofa
Rug Demo
Rug

Update:

We add running RGBL-disentanglement in the "lang_disent" branch. See the following section for instructions. Fix some errors in showing visualization.

🚀 Getting Started

📦 Dataset

mkdir -p data
cd data
wget https://huggingface.co/datasets/kxic/vMAP/resolve/main/vmap.zip
unzip vmap.zip

🛠️ Installation

git clone https://github.com/rpng/online_lang_splatting.git --recursive
cd online_lang_splatting

Setup the environment.

conda env create -f environment.yaml
conda activate LangGS

💬 Language Model Setup

cd langauge/sed/open_clip
make install
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

Download language model weights from

https://drive.google.com/file/d/1zAXE0QXy47n0cVn7j_2cSR85eqxdDGg8/view?usp=drive_link

Edit language/configs/convnextL_768.yaml and Set the WEIGHTS to the path of the downloaded language model weights

cd online_lang_splatting
python create_lang_model.py --config language/configs/convnextL_768.yaml

🧠 Language Features Demo

Downlod the pre-trained weights (HuggingFace link at the top), and place the models under "pretrained_models". We use omni_general indoor trained weights. You can download (if not already) using the following command:

hf download slamDev/OnlineLanguageSplatting --repo-type=dataset

To test language feature on your own image, run

python3 language/language_features.py --high-res-model "pretrained_models/omni_general/high_res_71_indoor.ckpt" --lang-model "seg_clip_model_l.pth" --input "sample/replica_room0.jpg" --query-text "vase"

This will show up the feature map and heatmap for localization.

🧭 Running the Pipeline

Edit paths in config/rgbd/replicav2/base_config.yaml.

  • auto_ckpt_path to load generalized autoencoder;
  • lang_model_path to point to the language model weights (e.g., seg_clip_model_l.pth from the last step).
  • hr_ckpt_path to point to the high resolution module weights.

for room0.yaml edit dataset_path to point to the room0 dataset and online_ckpt_path to where you want the online AE checkpoint in 2-stage pipeline to be saved.

To Run ▶️ 2-Stage Pipeline

In base_config.yaml point auto_ckpt_path and hr_ckpt_path to the respective files and in room0.yaml set single_stage_ae to False.

To Run ▶️ 1-Stage Pipeline

To run the 1-stage pipeline, open room0.yaml and update the following parameters:

  • Set auto_ckpt_path to the cross-data generalization checkpoint file.
  • Set single_stage_ae to True.

We use a 4-split strategy for training:

  • Split 1: office0, room0
  • Split 2: office1, room1
  • Split 3: office2, room2
  • Split 4: office3, office4 Training and Testing Example for 4-Split Strategy:
  • Run 1: Train on Splits 2, 3, 4 → Test on Split 1
  • Run 2: Train on Splits 1, 3, 4 → Test on Split 2
  • Run 3: Train on Splits 1, 2, 4 → Test on Split 3
  • Run 4: Train on Splits 1, 2, 3 → Test on Split 4

The weights are in the pretrained weights folder. Use appropriate weights Example: For evaluating on room0 and office0, use weights from Run 1.

Note: Edit the paths in base_config.yaml to specifiy autoencoder path and save dir path respectively.

python3 slam.py --config configs/rgbd/replicav2/room0.yaml

🏹 Run RGBL-disentanglement

git checkout lang_disent
  1. Download the processed dataset on Replica-room0 here. Download the single-stage pretrained AE here.
cd submodules/diff-gaussian-rasterization-disentangle-optim
pip install -e .
cd ../..
  1. Follow the previous step to donwload the pretrained models. Edit the pretrained model path in configs/rgbd/replicav2/base_config.yaml, room0_disent.yaml and room0_disent_w_labels.yaml (Edit the dataset path and single-stage AE path.)

  2. Run the single-stage AE with RGB-L disent

python3 slam.py --config configs/rgbd/replicav2/room0_disent.yaml
  1. We also prepare high-resolution language feature labels from LangSplat. Edit the language label path and run
python3 slam.py --config configs/rgbd/replicav2/room0_disent_w_labels.yaml

A GUI window will pop up and show the SLAM results

Evaluate

🔖 Create Labels

After runnin our training script (or LangSplat's training), our results are saved in a structure like <result_sequence>/psnr/before_opt/

The following script first convert VMAP's prepared Replica's segmentation maps to json format. See the argument help in create_replica_labels.py --help.

It will save json under our_result_dir/gt/label or langsplat_result_dir/gt/label.

python3 eval/create_replica_labels.py --langslam_dir <our_result_dir/psnr/before_opt> --langsplat_dir (optional) <langsplat_result_dir> --seg_file_config <path_to_render_config.yaml_of_a_seq>

✅ Evaluate 2-Stage Pipeline

To evaluate 2 stage

python3 eval/evaluate_onlinelangslam.py --dataset_name <room0, room1, ...> --root_dir <our_result_dir> --ae_ckpt_dir <generalized_ae_path> --online_ae_ckpt <online_ae_path>

✅ Evaluate 1-Stage Pipeline

To evaluate cross data genenarizable

python3 eval/evaluate_langslam.py --dataset_name <room0, room1, ...> --root_dir <our_result_dir> --ae_ckpt_dir <pretrained_single_stage_ae_path (see pretrained weights folder)> 

🧱 3D Evaluation

⚠️ Note: in each .py file, please read the comment and change path variables that match your local.

Prepare colorized GT by running

cd eval/tsdf_fusion
python3 save_semantic_colors_gt.py

To reconstruct TSDF for groundtruth, run

python3 dim3_recon_gt.py
cd PytorchEMD; python3 setup.py

copy the compiled .so file to the tsdf-fusion folder (move one level up)

▶️ Run 3D Evaluation LangSlam

python3 3d_evaluation_and_visualize_langslam_dim15.py

LangSplat

python3 3d_evaluation_and_visualize_langsplat.py

🧪 Training

To train your own AE on your domain for 1-stage

Language feature script can be used to save high or low resolution langauge features labels to train auto encoder on your own domain.

python3 language/autoencoder/train_encoder_light.py

🧬 Reprodicibility

There might be minor differences between the released version and the results in the paper. Please bear in mind that multi-process performance has some randomness due to GPU utilisation. We run all our experiments on an RTX A4500 GPU, and the performance may differ when running with a different GPU.

🙏 Acknowledgement

This work incorporates many open-source codes. We extend our gratitude to the authors of the software.

🎲 Some Error

  • If you see error like LangSupervisedNet doesn't have attribute load_from_checkpoint in self.hr_model.load_from_checkpoint(...). This is the issue of Pytorch-lightning version. Edit those lines to LangSupervisedNet.load_from_checkpoint(...)

📖 Citation

If you find this work helpful, please consider citing us:

@inproceedings{katragadda2025_onlinelang,
  title     = {{O}nline {L}anguage {S}platting},
  author    = {Saimouli Katragadda and Cho-Ying Wu and Yuliang Guo and Xinyu Huang and Guoquan Huang and Liu Ren},
  booktitle = {ICCV},
  year      = {2025}
}