Skip to content

[ICLR 2026] FSOD-VFM: Few-Shot Object Detection with Vision Foundation Models and Graph Diffusion

Notifications You must be signed in to change notification settings

Intellindust-AI-Lab/FSOD-VFM

Repository files navigation

[ICLR 2026] FSOD-VFM: Few-Shot Object Detection with Vision Foundation Models and Graph Diffusion

Chen-Bin Feng1,2*,   Youyang Sha1*,   Longfei Liu1,   Yongjun Yu1,   Chi Man Vong2†,   Xuanlong Yu1†,   Xi Shen1†

1. Intellindust AI Lab    2. University of Macau
* Equal Contribution    † Corresponding Author


license arXiv project webpage prs issues stars Contact Us


FSOD-VFM is a framework for few-shot object detection leveraging powerful vision foundation models (VFMs). It integrates three key components:

🔹 Universal Proposal Network (UPN) for category-agnostic bounding box generation
🔹 SAM2 for accurate mask extraction
🔹 DINOv2 features for efficient adaptation to novel object categories

To address over-fragmentation in proposals, FSOD-VFM introduces a novel graph-based confidence reweighting strategy for refining detections.

If you find our work useful, please give us a ⭐!


Overview


🚀 Updates

  • [2026.2.3] Initial release of FSOD-VFM.

🧭 Table of Contents

  1. Datasets
  2. Quick Start
  3. Usage
  4. Citation
  5. Acknowledgement

1. Datasets

Put all datasets under FSOD-VFM/dataset/:

git clone https://github.com/Intellindust-AI-Lab/FSOD-VFM
cd FSOD-VFM 
mkdir dataset

Pascal VOC

Download Pascal VOC from http://host.robots.ox.ac.uk/pascal/VOC,
then put it under /dataset/ following structure:

    dataset/PascalVOC/
    ├── VOC2007/
    ├── VOC2007Test/
    │   └── VOC2007
    │   │  ├── JPEGImages
    │   │  └── ...
    │   └── ...
    └── VOC2012/

COCO

Download COCO from https://cocodataset.org and organize it as:

dataset/coco/
├── annotations/
├── train2017/
├── val2017/
└── test2017/

CD-FSOD

Download CD-FSOD from https://yuqianfu.com/CDFSOD-benchmark/, and organize as:

dataset/CDFSOD/
    ├── ArTaxOr/...
    ├── clipart1k/...
    ├── DIOR/...
    ├── FISH/...
    ├── NEU-DET/...
    └── UODD/...

2. Quick Start

Environment Setup

conda env create -f fsod.yml
conda activate FSODVFM

DINOv2 Installation

# Ensure the operation is performed inside the /FSOD-VFM directory
git clone https://github.com/facebookresearch/dinov2.git

UPN Installation

conda install -c conda-forge gcc=9.5.0 gxx=9.5.0 ninja -y
cd chatrex/upn/ops
pip install -v -e .

SAM2 Installation

# Ensure the operation is performed outside the /FSOD-VFM directory
cd ../../../../
git clone https://github.com/facebookresearch/sam2.git && cd sam2
pip install -e .

Checkpoints

# Make sure the checkpoints folder is inside the project root (FSODVFM/checkpoints). 
cd FSOD-VFM && mkdir checkpoints && cd checkpoints 
wget https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_large.pt
wget https://dl.fbaipublicfiles.com/dinov2/dinov2_vitl14/dinov2_vitl14_pretrain.pth
wget https://github.com/IDEA-Research/ChatRex/releases/download/upn-large/upn_large.pth

3. Usage

Pascal VOC

sh run_scripts/run_pascal.sh

Tips:

  • Modify --json_path for different splits (split1, split2, split3) and shot settings (1shot, 5shot, etc.).

  • Modify --target categories for different splits.

  • Adjust hyperparameters:

    • --min_threshold: UPN confidence threshold (default: 0.01)
    • --alp: alpha for graph diffusion
    • --lamb: decay parameter for graph diffusion
  • To fix shell script issues:

    sed -i 's/\r$//' run_scripts/run_pascal.sh

Overview


COCO

sh run_scripts/run_coco.sh

Tips:

  • Modify --json_path for 10shot or 30shot.
  • Target categories are fixed to the standard COCO 20 classes.

Overview


CD-FSOD

sh run_scripts/run_cdfsod.sh

Tips:

  • Modify --json_path, --test_json, and --test_img_dir for different subsets (e.g., ArTaxOr, DIOR).

  • For DIOR, use:

    --test_img_dir ./dataset/CDFSOD/DIOR/test/new_test/
    

Overview


4. Citation

If you use FSOD-VFM in your research, please cite:

@inproceedings{feng2025fsodvfm,
  title={Few-Shot Object Detection with Vision Foundation Models and Graph Diffusion},
  author={Feng, Chen-Bin and Sha, Youyang and Liu, Longfei and Yu, Yongjun and Vong, Chi Man and Yu, Xuanlong and Shen, Xi},
  booktitle={The Fourteenth International Conference on Learning Representations},
  year={2026}
}

5. Acknowledgement

Our work builds upon excellent open-source projects including No-Time-To-Train, SAM2, ChatRex, and DINOv2. We sincerely thank their authors for their contributions to the community.

About

[ICLR 2026] FSOD-VFM: Few-Shot Object Detection with Vision Foundation Models and Graph Diffusion

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published