Reasoning-centric evaluation of multimodal unified models across Understanding – Generation Consistency (UGC), Text-to-Image, and Editing, revealing the persistent gap between reasoning and faithful generation.

2026/01/26: 🎉 GIR-bench is accepted by ICLR 2026!2025/10/14: We have released the evaluation code and the dataset for GIR-bench.
conda create -n GIR-Bench python==3.10
conda activate GIR-Bench
pip install -r requirement.txt
git clone https://github.com/facebookresearch/dinov3.githuggingface-cli download --resume --repo-type dataset lihxxx/GIR-Bench --local-dir ./dataset
mkdir weights
huggingface-cli download --resume-download OpenGVLab/InternVL3_5-38B-HF --local-dir ./weights/InternVL3_5-38B-HF
Please download dinov3_vit7b16_pretrain_lvd1689m-a955f4ea.pth from the Meta DINOv3 Downloads page and place it under weights/.
bash run_evaluation_gen.shbash run_evaluation_edit.shPlease organize your model outputs as below and put them into the corresponding MODELS_DIR. Default locations:
- t2i:
MODELS_DIR=./dataset/generation/t2i - editing:
MODELS_DIR=./dataset/generation/editing
Recommended directory and naming conventions (filenames must align with the task id in the dataset):
dataset/
└── generation/
├── t2i/
│ └── <YourModel>/
│ ├── SpatialLayout/
│ │ └── <image_id>.png
│ ├── NumericalReasoning/
│ │ └── <image_id>.png
│ ├── TextRendering/
│ │ └── <image_id>.png
│ ├── Zoology/
│ │ └── <image_id>.png
│ ├── Botany/
│ │ └── <image_id>.png
│ └── Geography/
│ └── <image_id>.png
└── editing/
└── <YourModel>/
├── ReasoningPerception/
│ └── <image_id>.png
├── VisualLogic/
│ └── <image_id>.png
└── VisualPuzzle/
└── <image_id>.png
By default, the scripts evaluate all subfolders under the configured MODELS_DIR. To evaluate only specific models:
- Set the
MODELSarray in the shell script:
MODELS=("YourModel1" "YourModel2")- Enable the
--modelsflag by uncommenting it in each Python call within the script:
# ... inside each python command block
--models "${MODELS[@]}"- Run the script:
bash run_evaluation_gen.sh
bash run_evaluation_edit.sh@article{li2025gir-bench,
title={GIR-Bench: Versatile Benchmark for Generating Images with Reasoning},
author={Hongxiang Li, Yaowei Li, Bin Lin, Yuwei Niu, Yuhang Yang, Xiaoshuang Huang, Jiayin Cai, Xiaolong Jiang, Yao Hu, Long Chen},
journal={arXiv preprint arXiv:2510.11026},
year={2025}
}
