GitHub - Intellindust-AI-Lab/FSOD-VFM: [ICLR 2026] FSOD-VFM: Few-Shot Object Detection with Vision Foundation Models and Graph Diffusion

[ICLR 2026] FSOD-VFM: Few-Shot Object Detection with Vision Foundation Models and Graph Diffusion

Chen-Bin Feng^1,2*, Youyang Sha^1*, Longfei Liu¹, Yongjun Yu¹, Chi Man Vong^2†, Xuanlong Yu^1†, Xi Shen^1†

1. Intellindust AI Lab 2. University of Macau
* Equal Contribution † Corresponding Author

FSOD-VFM is a framework for few-shot object detection leveraging powerful vision foundation models (VFMs). It integrates three key components:

🔹 Universal Proposal Network (UPN) for category-agnostic bounding box generation
🔹 SAM2 for accurate mask extraction
🔹 DINOv2 features for efficient adaptation to novel object categories

To address over-fragmentation in proposals, FSOD-VFM introduces a novel graph-based confidence reweighting strategy for refining detections.

If you find our work useful, please give us a ⭐!

🚀 Updates

[2026.2.3] Initial release of FSOD-VFM.

1. Datasets

Put all datasets under FSOD-VFM/dataset/:

git clone https://github.com/Intellindust-AI-Lab/FSOD-VFM
cd FSOD-VFM 
mkdir dataset

Pascal VOC

Download Pascal VOC from http://host.robots.ox.ac.uk/pascal/VOC,
then put it under /dataset/ following structure:

    dataset/PascalVOC/
    ├── VOC2007/
    ├── VOC2007Test/
    │   └── VOC2007
    │   │  ├── JPEGImages
    │   │  └── ...
    │   └── ...
    └── VOC2012/

COCO

Download COCO from https://cocodataset.org and organize it as:

dataset/coco/
├── annotations/
├── train2017/
├── val2017/
└── test2017/

CD-FSOD

Download CD-FSOD from https://yuqianfu.com/CDFSOD-benchmark/, and organize as:

dataset/CDFSOD/
    ├── ArTaxOr/...
    ├── clipart1k/...
    ├── DIOR/...
    ├── FISH/...
    ├── NEU-DET/...
    └── UODD/...

2. Quick Start

Environment Setup

conda env create -f fsod.yml
conda activate FSODVFM

DINOv2 Installation

# Ensure the operation is performed inside the /FSOD-VFM directory
git clone https://github.com/facebookresearch/dinov2.git

UPN Installation

conda install -c conda-forge gcc=9.5.0 gxx=9.5.0 ninja -y
cd chatrex/upn/ops
pip install -v -e .

SAM2 Installation

# Ensure the operation is performed outside the /FSOD-VFM directory
cd ../../../../
git clone https://github.com/facebookresearch/sam2.git && cd sam2
pip install -e .

Checkpoints

# Make sure the checkpoints folder is inside the project root (FSODVFM/checkpoints). 
cd FSOD-VFM && mkdir checkpoints && cd checkpoints 
wget https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_large.pt
wget https://dl.fbaipublicfiles.com/dinov2/dinov2_vitl14/dinov2_vitl14_pretrain.pth
wget https://github.com/IDEA-Research/ChatRex/releases/download/upn-large/upn_large.pth

3. Usage

Pascal VOC

sh run_scripts/run_pascal.sh

Tips:

Modify --json_path for different splits (split1, split2, split3) and shot settings (1shot, 5shot, etc.).
Modify --target categories for different splits.
Adjust hyperparameters:
- --min_threshold: UPN confidence threshold (default: 0.01)
- --alp: alpha for graph diffusion
- --lamb: decay parameter for graph diffusion

To fix shell script issues:

sed -i 's/\r$//' run_scripts/run_pascal.sh

COCO

sh run_scripts/run_coco.sh

Tips:

Modify --json_path for 10shot or 30shot.
Target categories are fixed to the standard COCO 20 classes.

CD-FSOD

sh run_scripts/run_cdfsod.sh

Tips:

Modify --json_path, --test_json, and --test_img_dir for different subsets (e.g., ArTaxOr, DIOR).

For DIOR, use:

--test_img_dir ./dataset/CDFSOD/DIOR/test/new_test/

4. Citation

If you use FSOD-VFM in your research, please cite:

@inproceedings{feng2025fsodvfm,
  title={Few-Shot Object Detection with Vision Foundation Models and Graph Diffusion},
  author={Feng, Chen-Bin and Sha, Youyang and Liu, Longfei and Yu, Yongjun and Vong, Chi Man and Yu, Xuanlong and Shen, Xi},
  booktitle={The Fourteenth International Conference on Learning Representations},
  year={2026}
}

5. Acknowledgement

Our work builds upon excellent open-source projects including No-Time-To-Train, SAM2, ChatRex, and DINOv2. We sincerely thank their authors for their contributions to the community.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
chatrex		chatrex
data		data
model		model
run_scripts		run_scripts
teaser		teaser
.branch-new		.branch-new
.gitignore		.gitignore
README.md		README.md
fsod.yml		fsod.yml
main.py		main.py
metric.py		metric.py
query_util.py		query_util.py
support_util.py		support_util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[ICLR 2026] FSOD-VFM: Few-Shot Object Detection with Vision Foundation Models and Graph Diffusion

🚀 Updates

🧭 Table of Contents

1. Datasets

Pascal VOC

COCO

CD-FSOD

2. Quick Start

Environment Setup

DINOv2 Installation

UPN Installation

SAM2 Installation

Checkpoints

3. Usage

Pascal VOC

COCO

CD-FSOD

4. Citation

5. Acknowledgement

About

Uh oh!

Releases

Packages

Languages

Intellindust-AI-Lab/FSOD-VFM

Folders and files

Latest commit

History

Repository files navigation

[ICLR 2026] FSOD-VFM: Few-Shot Object Detection with Vision Foundation Models and Graph Diffusion

🚀 Updates

🧭 Table of Contents

1. Datasets

Pascal VOC

COCO

CD-FSOD

2. Quick Start

Environment Setup

DINOv2 Installation

UPN Installation

SAM2 Installation

Checkpoints

3. Usage

Pascal VOC

COCO

CD-FSOD

4. Citation

5. Acknowledgement

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages