[Paper]
This paper proposes a multi-view collaborative matching strategy for reliable track construction in complex scenarios. We observe that the pairwise matching paradigms applied to image set matching often result in ambiguous estimation when the selected independent pairs exhibit significant occlusions or extreme viewpoint changes. This challenge primarily stems from the inherent uncertainty in interpreting intricate 3D structures based on limited two-view observations, as the 3D-to-2D projection leads to significant information loss. To address this, we introduce CoMatcher, a deep multi-view matcher to (i) leverage complementary context cues from different views to form a holistic 3D scene understanding and (ii) utilize cross-view projection consistency to infer a reliable global solution. Building on CoMatcher, we develop a groupwise framework that fully exploits cross-view relationships for large-scale matching tasks. Extensive experiments on various complex scenarios demonstrate the superiority of our method over the mainstream two-view matching paradigm.
git clone https://github.com/EATMustard/CoMatcher.git
cd comatcher
conda env create -f environment.ymlWhen running the evaluation script, the dataset will be automatically downloaded to the data directory by default. We provide two evaluation scripts: one for pairwise matching and one for single-forward multi-view matching.
python -m src.eval.multiview_hpatches --conf superpoint+comatcher --overwrite # for comatcher
python -m src.eval.hpatches --conf superpoint+lightglue --overwrite # for pairwise methods like superpoint+lightglueAdditionally, for homography estimation, if you wish to use Poselib instead of OpenCV’s estimator, use the following command (refer to glue-factory for more details):
python -m src.eval.multiview_hpatches --conf superpoint+comatcher --overwrite \
eval.estimator=poselib eval.ransac_th=-1Since the original MegaDepth-1500 dataset is designed for two-view matching, we have recaptured multi-view images from its scenes for evaluation. This forms a “3-to-1” quadruple dataset: Multi-View MegaDepth-1500, which contains 4500 image pairs. We also provide two evaluation scripts for both approaches:
python -m src.eval.multiview_megadepth1500 --conf superpoint+comatcher --overwriteIf you want to eval two-view matches in the same dataset(4500 pairs), like SuperPoint+LightGlue model:
python -m src.eval.multiview_megadepth1500 --conf superpoint+comatcher --overwrite
python -m src.eval.megadepth4500 --conf superpoint+lightglue-official --overwriteCoMatcher follows a two-step training strategy: pre-training on a synthetic dataset, followed by fine-tuning on MegaDepth.
python -m src.train sp+comatcher_homography \ # experiment name
--conf src/configs/multi-view/superpoint+comatcher_homography.yaml
python -m src.train sp+comatcher_megadepth \
--conf src/configs/multi-view/superpoint+comatcher_megadepth.yaml \
train.load_experiment=sp+comatcher_homographyTo evaluate a trained model, use the --checkpoint argument:
python -m src.eval.multiview_megadepth1500 --checkpoint sp+comatcher- End-to-end track construction toolkit embedded in COLMAP
- Pretrained checkpoints
- IMC evaluations
- More baselines
We extend our gratitude to the developers of the Glue-Factory codebase for their excellent work, which enabled us to extend their framework to multi-view scenarios.
@inproceedings{zhang2025comatcher,
title = {CoMatcher: Multi-View Collaborative Feature Matching},
author = {Zhang, Jintao and Xia, Zimin and Dong, Mingyue and Shen, Shuhan and Yue, Linwei and Zheng, Xianwei},
booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference},
pages = {21970--21980},
year = {2025}
}