CVPR 2026 decisions are now available on OpenReview!25.42% = 4090 / 16092
注1:欢迎各位大佬提交issue,分享CVPR 2026论文和开源项目!
注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision
欢迎扫码加入【CVer学术交流群】,可以获取CVPR 2026等最前沿工作!这是最大的计算机视觉AI知识星球!每日更新,第一时间分享最新最前沿的计算机视觉、AIGC、扩散模型、多模态、深度学习、自动驾驶、医疗影像和遥感等方向的学习资料,快加入学起来!
- 3DGS(Gaussian Splatting)
- Agent)
- Avatars
- Backbone
- CLIP
- Mamba
- Embodied AI
- GAN
- GNN
- 多模态大语言模型(MLLM)
- 大语言模型(LLM)
- NAS
- OCR
- NeRF
- DETR
- 扩散模型(Diffusion Models)
- ReID(重识别)
- 长尾分布(Long-Tail)
- Vision Transformer
- 视觉和语言(Vision-Language)
- 自监督学习(Self-supervised Learning)
- 数据增强(Data Augmentation)
- 目标检测(Object Detection)
- 异常检测(Anomaly Detection)
- 目标跟踪(Visual Tracking)
- 语义分割(Semantic Segmentation)
- 实例分割(Instance Segmentation)
- 全景分割(Panoptic Segmentation)
- 医学图像(Medical Image)
- 医学图像分割(Medical Image Segmentation)
- 视频目标分割(Video Object Segmentation)
- 视频实例分割(Video Instance Segmentation)
- 参考图像分割(Referring Image Segmentation)
- 图像抠图(Image Matting)
- 图像编辑(Image Editing)
- Low-level Vision
- 超分辨率(Super-Resolution)
- 去噪(Denoising)
- 去模糊(Deblur)
- 自动驾驶(Autonomous Driving)
- 3D点云(3D Point Cloud)
- 3D目标检测(3D Object Detection)
- 3D语义分割(3D Semantic Segmentation)
- 3D目标跟踪(3D Object Tracking)
- 3D语义场景补全(3D Semantic Scene Completion)
- 3D配准(3D Registration)
- 3D人体姿态估计(3D Human Pose Estimation)
- 3D人体Mesh估计(3D Human Mesh Estimation)
- 3D Visual Grounding(3D视觉定位)
- 医学图像(Medical Image)
- 图像生成(Image Generation)
- 视频生成(Video Generation)
- 3D生成(3D Generation)
- 视频理解(Video Understanding)
- 行为检测(Action Detection)
- 具身智能(Embodied AI)
- 遥感(Remote)
- 文本检测(Text Detection)
- 知识蒸馏(Knowledge Distillation)
- 模型剪枝(Model Pruning)
- 图像压缩(Image Compression)
- 三维重建(3D Reconstruction)
- 深度估计(Depth Estimation)
- 轨迹预测(Trajectory Prediction)
- 车道线检测(Lane Detection)
- 图像描述(Image Captioning)
- 视觉问答(Visual Question Answering)
- 手语识别(Sign Language Recognition)
- 视频预测(Video Prediction)
- 新视点合成(Novel View Synthesis)
- Zero-Shot Learning(零样本学习)
- 立体匹配(Stereo Matching)
- 特征匹配(Feature Matching)
- 暗光图像增强(Low-light Image Enhancement)
- 场景图生成(Scene Graph Generation)
- 风格迁移(Style Transfer)
- 隐式神经表示(Implicit Neural Representations)
- 图像质量评价(Image Quality Assessment)
- 视频质量评价(Video Quality Assessment)
- 压缩感知(Compressive Sensing)
- 数据集(Datasets)
- 新任务(New Tasks)
- 其他(Others)
StructXLIP: Enhancing Vision-language Models with Multimodal Structural Cues
ApET: Approximation-Error Guided Token Compression for Efficient VLMs
#3D Visual Grounding(3D视觉定位)
ExpPortrait: Expressive Portrait Generation via Personalized Representation
- Paper: https://arxiv.org/abs/2602.19900
- Code:
tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction
- Project: https://cwchenwang.github.io/tttLRM/
- Paper: https://arxiv.org/abs/2602.20160
- Code: https://github.com/cwchenwang/tttLRM
Flow3r: Factored Flow Prediction for Scalable Visual Geometry Learning
- Project: https://flow3r-project.github.io/
- Paper: https://arxiv.org/abs/2602.20157
- Code: https://github.com/Kidrauh/flow3r
RAP: Fast Feedforward Rendering-Free Attribute-Guided Primitive Importance Score Prediction for Efficient 3D Gaussian Splatting Processing
Brewing Stronger Features: Dual-Teacher Distillation for Multispectral Earth Observation
- Paper: https://arxiv.org/abs/2602.19863
- Code: None
Decoupling Defense Strategies for Robust Image Watermarking
- Paper: https://arxiv.org/abs/2602.20053
- Code: None
Multi-Modal Representation Learning via Semi-Supervised Rate Reduction for Generalized Category Discovery
- Paper: https://arxiv.org/abs/2602.19910
- Code:
The Invisible Gorilla Effect in Out-of-distribution Detection
