Skip to content
Change the repository type filter

All

    Repositories list

    • UrbanNav

      Public
      [AAAI 2026] Official implementation of paper "UrbanNav: Learning Language-Guided Embodied Urban Navigation from Web-Scale Human Trajectories"
      Python
      34310Updated Jan 30, 2026Jan 30, 2026
    • ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval
      Python
      0800Updated Jan 6, 2026Jan 6, 2026
    • VRoPE

      Public
      [EMNLP 2025 Main] Official implementation of VRoPE: Rotary Position Embedding for Video Large Language Models.
      Python
      02700Updated Nov 18, 2025Nov 18, 2025
    • An efficient GRPO training util.
      Python
      25400Updated Jun 13, 2025Jun 13, 2025
    • VideoNIAH

      Public
      VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs
      Python
      15440Updated Mar 9, 2025Mar 9, 2025
    • COSA

      Public
      [ICLR2024] Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
      Python
      34330Updated Dec 25, 2024Dec 25, 2024
    • VALOR

      Public
      [TPAMI2024] Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
      Python
      1830770Updated Dec 25, 2024Dec 25, 2024
    • DANet

      Public
      Dual Attention Network for Scene Segmentation (CVPR2019)
      Python
      4842.5k611Updated Dec 23, 2024Dec 23, 2024
    • MRES

      Public
      This repo holds the official code and data for "Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentation", accepted by CVPR 2…
      07250Updated Jun 3, 2024Jun 3, 2024
    • SC-Tune

      Public
      Official code for CVPR 2024 paper, "SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models"
      Python
      11610Updated Apr 22, 2024Apr 22, 2024
    • VAST

      Public
      [NIPS2023] Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
      Jupyter Notebook
      18298220Updated Mar 14, 2024Mar 14, 2024
    • GLOBER

      Public
      Python
      0910Updated Jan 11, 2024Jan 11, 2024
    • ChatBridge, an approach to learning a unified multimodal model to interpret, correlate, and reason about various modalities without relying on all combinations…
      Python
      15460Updated Sep 4, 2023Sep 4, 2023
    • Official PyTorch implementation of the paper "Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner"
      Python
      11510Updated Aug 9, 2023Aug 9, 2023
    • MOSO

      Public
      Python
      23550Updated Jun 6, 2023Jun 6, 2023