Skip to content
Change the repository type filter

All

    Repositories list

    • The official implementation for the paper "What does RL improve for Visual Reasoning? A Frankenstein-Style Analysis"
      Python
      2600Updated Feb 18, 2026Feb 18, 2026
    • FaSTAR

      Public
      [ICLR 2026] Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing
      Jupyter Notebook
      22900Updated Feb 6, 2026Feb 6, 2026
    • TSRBench

      Public
      TSRBench: A Comprehensive Multi-task Multi-modal Time Series Reasoning Benchmark for Generalist Models
      Python
      01600Updated Jan 30, 2026Jan 30, 2026
    • VREX

      Public
      V-REX: Benchmarking Exploratory Visual Reasoning via Chain-of-Questions
      Python
      0700Updated Dec 15, 2025Dec 15, 2025
    • RoMA

      Public
      Code for "Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs"
      Jupyter Notebook
      31620Updated Nov 6, 2025Nov 6, 2025
    • Code for "ChartAB: A Benchmark for Chart Grounding & Dense Alignment"
      Jupyter Notebook
      1500Updated Nov 4, 2025Nov 4, 2025
    • HallusionBench

      Public
      [CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and …
      Python
      932510Updated Oct 14, 2025Oct 14, 2025
    • RuleR

      Public
      [NAACL'25] RuleR: Improving LLM Controllability by Rule-based Data Recycling
      Python
      11410Updated Sep 27, 2025Sep 27, 2025
    • Mosaic-IT

      Public
      [ACL'25] Mosaic-IT: Cost-Free Compositional Data Synthesis for Instruction Tuning
      Python
      42000Updated Sep 27, 2025Sep 27, 2025
    • [NeurIPS'25] ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness
      Python
      03000Updated Sep 27, 2025Sep 27, 2025
    • DisCL

      Public
      [ICCV 2025] Diffusion Curriculum (DisCL)
      Jupyter Notebook
      01730Updated Sep 26, 2025Sep 26, 2025
    • [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning
      Python
      1618900Updated Jun 25, 2025Jun 25, 2025
    • [NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other models
      Python
      2741640Updated Jun 25, 2025Jun 25, 2025
    • [COLM'25] Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?
      Python
      13710Updated Jun 5, 2025Jun 5, 2025
    • C3PO

      Public
      [COLM 2025] "C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing"
      Jupyter Notebook
      12010Updated Apr 9, 2025Apr 9, 2025
    • CoSTAR

      Public
      Cost-Sensitive Toolpath Agent for Multi-turn Image Editing
      Jupyter Notebook
      12600Updated Mar 26, 2025Mar 26, 2025
    • R2-T2

      Public
      [ICML 2025] Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"
      Python
      21910Updated Mar 10, 2025Mar 10, 2025
    • MosT

      Public
      Code for "Many-objective multi-solution transport"
      Python
      0200Updated Feb 28, 2025Feb 28, 2025
    • BenTo

      Public
      [ICLR 2025] "BENTO: benchmark reduction with in-context learning transferability"
      Python
      0500Updated Oct 18, 2024Oct 18, 2024
    • [ICLR 2025 Oral] "Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free"
      Python
      119011Updated Oct 15, 2024Oct 15, 2024
    • DEBATunE

      Public
      [ACL'24] Can LLMs Speak For Diverse People? Tuning LLMs via Debate to Generate Controllable Controversial Statements
      Python
      32400Updated Sep 14, 2024Sep 14, 2024
    • [ACL'24] Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning
      Python
      3036600Updated Sep 6, 2024Sep 6, 2024
    • mctune

      Public
      [ACL'24] Multi-Objective Linguistic Control of Large Language Models
      Python
      0200Updated Jun 30, 2024Jun 30, 2024