SimpleAICV

Open-source / Simple / Lightweight / Easy-to-use / Extensible

📢 News!
Simplicity
Introduction
All task training results
Training GPU server
Environments
Download pretrained models and experiments checkpoints/logs
Download datasets
How to use gradio demo
How to use inference demo
How to train or test model
Prepare datasets
Reference
My column
Citation

📢 News!

2026/02/02: update dinov3 backbone implementation in SimpleAICV/detection/models/backbones.
2026/02/02: update SAM(segment_anything)/SAM_Matting model training pipeline and jupyter example in 13.interactive_segmentation_training.
2026/02/02: update SAM2(segment_anything2)/SAM2_Matting model training pipeline and jupyter example in 14.video_interactive_segmentation_training.
2026/02/02: update universal_segmentation/universal_matting model training pipeline in 16.universal_segmentation_training.
2026/02/02: updata all task gradio demo in gradio_demo.
2026/02/02: updata all task inference demo in inference_demo.

Simplicity

This repository maintains a lightweight codebase.It requiring only Python and PyTorch as core dependencies(no third-party frameworks like MMCV).

Introduction

This repository provides simple training and testing examples for following tasks:

task	support dataset	support model
00.classification_training	CIFAR100 ImageNet1K(ILSVRC2012) ImageNet21K(Winter 2021 release)	DarkNet ResNet Convformer VAN ViT
01.distillation_training	ImageNet1K(ILSVRC2012)	DML loss(ResNet) KD loss(ResNet)
02.masked_image_modeling_training	ImageNet1K(ILSVRC2012)	MAE(ViT)
03.detection_training	COCO2017 Objects365(v2,2020) VOC2007&VOC2012	RetinaNet FCOS DETR
04.semantic_segmentation_training	ADE20K COCO2017	pfan_semantic_segmentation
05.instance_segmentation_training	COCO2017	SOLOv2 YOLACT
06.salient_object_detection_training	combine dataset	pfan_segmentation
07.human_matting_training	combine dataset	pfan_matting
08.ocr_text_detection_training	combine dataset	DBNet
09.ocr_text_recognition_training	combine dataset	CTC_Model
10.face_detection_training	combine dataset	RetinaFace
11.face_parsing_training	CelebAMask-HQ FaceSynthetics	pfan_face_parsing
12.human_parsing_training	CIHP LIP	pfan_human_parsing
13.interactive_segmentation_training	combine dataset	SAM(segment_anything) SAM_Matting
14.video_interactive_segmentation_training	combine dataset	SAM2(segment_anything2) SAM2_Matting
16.universal_segmentation_training	combine dataset	universal_segmentation universal_matting

All task training results

See all task training results in RESULTS.md.

Training GPU server

1、1-8 RTX 4090D(24GB) GPUs, Python3.12, Pytorch2.5.1, CUDA12.4, Ubuntu22.04(for most experiments).

2、8 RTX PRO 6000(96GB) GPUs, Python3.12, Pytorch2.8.0, CUDA12.8, Ubuntu22.04(for 13.interactive_segmentation_training/14.video_interactive_segmentation_training/16.universal_segmentation_training).

Environments

1、Python and Pytorch Supported Version: Python>=3.12, Pytorch>=2.5.1.

2、Most Experiments only support Single-Node Single-GPU training/Single-Node Multi-GPU DDP training, but 13.interactive_segmentation_training/14.video_interactive_segmentation_training also support Multi-Node Multi-GPU DDP training(Requires InfiniBand/RoCE).

3、Create a conda environment:

conda create -n SimpleAICV python=3.12

4、Install PyTorch:

conda install pytorch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 pytorch-cuda=12.4 -c pytorch -c nvidia

To install a different PyTorch version, find command from here:

https://pytorch.org/get-started/previous-versions/

5、Install other Packages:

pip install -r requirements.txt

Download pretrained models and experiments checkpoints/logs

You can download all my pretrained models and experiments checkpoints/logs from huggingface or Baidu-Netdisk.

If you only need the pretrained models (model.state_dict()), you can download the pretrained_models folder.

# huggingface
https://huggingface.co/zgcr654321/00.classification_training/tree/main
https://huggingface.co/zgcr654321/01.distillation_training/tree/main
https://huggingface.co/zgcr654321/02.masked_image_modeling_training/tree/main
https://huggingface.co/zgcr654321/03.detection_training/tree/main
https://huggingface.co/zgcr654321/04.semantic_segmentation_training/tree/main
https://huggingface.co/zgcr654321/05.instance_segmentation_training/tree/main
https://huggingface.co/zgcr654321/06.salient_object_detection_training/tree/main
https://huggingface.co/zgcr654321/07.human_matting_training/tree/main
https://huggingface.co/zgcr654321/08.ocr_text_detection_training/tree/main
https://huggingface.co/zgcr654321/09.ocr_text_recognition_training/tree/main
https://huggingface.co/zgcr654321/10.face_detection_training/tree/main
https://huggingface.co/zgcr654321/11.face_parsing_training/tree/main
https://huggingface.co/zgcr654321/12.human_parsing_training/tree/main
https://huggingface.co/zgcr654321/13.interactive_segmentation_training/tree/main
https://huggingface.co/zgcr654321/14.video_interactive_segmentation_training/tree/main
https://huggingface.co/zgcr654321/16.universal_segmentation_training/tree/main
https://huggingface.co/zgcr654321/pretrained_models/tree/main

# Baidu-Netdisk
链接:https://pan.baidu.com/s/17oSFXgIy1vxUdPUhTzRkdw?pwd=3l99
提取码：3l99

Download datasets

You can download all datasets from Baidu-Netdisk.

# Baidu-Netdisk
链接: https://pan.baidu.com/s/1zjwdVNliOMS3xwuuY41gcA?pwd=z9sa 
提取码: z9sa

How to use gradio demo

cd to gradio_demo folder,we have:

00.gradio_classify_single_image.py
03.gradio_detect_single_image.py
04.gradio_semantic_segment_single_image.py
05.gradio_instance_segment_single_image.py
06.gradio_salient_object_detection_single_image.py
07.gradio_human_matting_single_image.py
08.gradio_ocr_text_detect_single_image.py
09.gradio_ocr_text_recognition_single_image.py
10.gradio_face_detect_single_image.py
11.gradio_face_parsing_single_image.py
12.gradio_human_parsing_single_image.py
13.0.0.gradio_sam_point_target_single_image.py
13.0.1.gradio_sam_circle_target_single_image.py
16.0.gradio_universal_segment_single_image.py
16.1.gradio_universal_matting_single_image.py

For example,you can run 03.gradio_detect_single_image.py(please prepare pretrained model weight first and modify pretrained model load path):

python 03.gradio_detect_single_image.py

How to use inference demo

cd to inference_demo folder,we have:

00.inference_classify_single_image.py
03.inference_detect_single_image.py
04.inference_semantic_segment_single_image.py
05.inference_instance_segment_single_image.py
06.inference_salient_object_detection_single_image.py
07.inference_human_matting_single_image.py
08.inference_ocr_text_detect_single_image.py
09.inference_ocr_text_recognition_single_image.py
10.inference_face_detect_single_image.py
11.inference_face_parsing_single_image.py
12.inference_human_parsing_single_image.py
13.0.inference_sam_single_image.py
16.0.inference_universal_segment_single_image.py
16.1.inference_universal_matting_single_image.py

For example,you can run 03.inference_detect_single_image.py(please prepare pretrained model weight first and modify pretrained model load path):

python 03.inference_detect_single_image.py

How to train or test model

If you want to train or test model, you need enter a training experiment folder directory, then run train.sh or test.sh.

For example,you can enter in folder 00.classification_training/imagenet/resnet50.

If you want to train model from scratch,please delete checkpoints and log folders first,then run train.sh:

CUDA_VISIBLE_DEVICES=0,1 torchrun \
    --nproc_per_node=2 \
    --master_addr 127.0.1.0 \
    --master_port 10000 \
    ../../../tools/train_classification_model.py \
    --work-dir ./

if you want to test model,you need have a pretrained model first,modify trained_model_path in test_config.py,then run test.sh:

CUDA_VISIBLE_DEVICES=0 torchrun \
    --nproc_per_node=1 \
    --master_addr 127.0.1.1 \
    --master_port 10001 \
    ../../../tools/test_classification_model.py \
    --work-dir ./

CUDA_VISIBLE_DEVICES is used to specify gpu_ids for training.Please make sure the number of nproc_per_node equal to the number of gpus.Make sure master_addr/master_port are unique for each training.

Checkpoints/log folders are saved in your training/testing experiment folder directory.Also, You can modify super parameters in train_config.py/test_config.py.

Prepare datasets