Skip to content

haoyangzheng-ai/didi-instruct

Repository files navigation

Ultra-Fast Language Generation via
Discrete Diffusion Divergence Instruct (DiDi-Instruct)

Blog YouTube Google Drive Hugging Face arXiv Python License: MIT

By Haoyang Zheng, Xinyang Liu, Cindy Xiangrui Kong, Nan Jiang, Zheyuan Hu, Weijian Luo, Wei Deng, and Guang Lin


🔄 Updates

  • 2026-02-05: We released the training code.
  • 2026-01-25: DiDi-Instruct was accepted by ICLR.
  • 2026-01-16: Invited talk on DiDi-Instruct is now available on YouTube.
  • 2025-10-06: We update the Blog.
  • 2025-10-05: We released the checkpoint on Hugging Face.
  • 2025-10-03: We updated the evaluation code and released the model checkpoint.
  • 2025-09-29: We uploaded our work to arXiv.

Abstract

Fast and high-quality language generation is the holy grail that people pursue in the age of AI. In this work, we introduce Discrete Diffusion Divergence Instruct (DiDi-Instruct), a training-based method that initializes from a pre-trained (masked) discrete diffusion language model (dLLM) and distills a few-step student for fast generation. The resulting DiDi-Instruct model achieves comparable or superior performance to its dLLM teacher and the GPT-2 baseline while enabling up to 64× acceleration. The theoretical foundation of DiDi-Instruct is a novel framework based on integral KL-divergence minimization, which yields a practical training algorithm. We further introduce grouped reward normalization, intermediate-state matching, and the reward-guided ancestral sampler that significantly improve training stability, model coverage, and inference quality. On OpenWebText, DiDi-Instruct achieves perplexity from 62.2 (8 NFEs) to 18.4 (128 NFEs), which outperforms prior accelerated dLLMs and GPT-2 baseline. These gains come with a negligible entropy loss (around 1%) and reduce additional training wall-clock time by more than 20× compared to competing dLLM distillation methods. We further validate the robustness and effectiveness of DiDi-Instruct through extensive ablation studies, model scaling, downstream tasks, and the generation of discrete protein sequences. In conclusion, DiDi-Instruct is an efficient yet effective distillation method, enabling language generation in the blink of an eye.


🚀 Feel the Generation Speed

Auto-Regressive Model (GPT-2 Small)
Token-by-token generation → high latency

ARM

Masked Diffusion Model (MDLM, 169M)
Iterative denoising → faster than GPT-2 Small.

MDLM

DiDi-Instruct (distilled from 169M MDLM)
Distilled few-step student → up to 64× speedup with matched/better quality.

DiDi-Instruct


🏗️ Usage Guide

1. Create and Activate the Conda Environment

Before first use, create and activate the conda environment from the provided environment.yml:

conda env create -f environment.yml
conda activate mask_model

2. Prepare the Teacher Model

You need a pre-trained discrete diffusion language model (dLLM) as the teacher. You have two options:

  • Option A (Train from scratch):

    • Refer to this script from DUO to train your own teacher model on OpenWebText.
    • This produces a checkpoint (e.g., mdlm.ckpt).
  • Option B (Use pre-trained checkpoint):

    • Download a pre-trained checkpoint from Google Drive (mdlm.ckpt).
    • Place it in the ./out/ directory for later use.

3. Distill the Student Model

Once you have the teacher model checkpoint, distill a few-step student model for fast inference:

bash ./scripts/distill-didi-instruct-owt.sh

The script will look for the teacher checkpoint and begin the distillation process.

Pre-trained checkpoint options:

  • Option 1 (from Google Drive):

    • Download the distilled DiDi-Instruct checkpoint from Google Drive (didi-instruct.ckpt).
    • Place the .ckpt file in the ./out/ directory.
  • Option 2 (from Hugging Face):

    • We provide the distilled model on Hugging Face.

    • Convert it to .ckpt format using:

      python ./models/hf_to_ckpt.py --hf_repo_id "haoyangzheng/didi-instruct-small" --output_dir "./out/didi-instruct.ckpt"

4. Evaluate the Model

Evaluate the distilled student model's performance by measuring perplexity and entropy compared to the teacher and baseline models:

bash ./scripts/eval-didi-instruct.sh

This produces performance metrics on the OpenWebText validation set.


📁 Repository Structure

didi-instruct-train/
├── configs/    # Configuration files
├── models/     # Model implementations
├── scripts/    # Training and evaluation scripts
├── out/        # Checkpoints and logs
├── algo.py     # Core algorithm implementations
├── dataloader.py
├── main.py
├── metrics.py
├── trainer_base.py
├── utils.py
├── environment.yml
├── README.md
└── LICENSE.md

📚 References

This repository is built upon DUO: "The Diffusion Duality. ICML 2025".

We also adopt ideas from DiMO, MDLM, SDTT, and nanoGPT.


📖 Citation

If you find this repository useful, please cite the following work:

@article{zheng2025ultra,
  title={{Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct}},
  author={Zheng, Haoyang and Liu, Xinyang and Kong, Cindy Xiangrui and Jiang, Nan and Hu, Zheyuan and Luo, Weijian and Deng, Wei and Lin, Guang},
  journal={{Proceedings of the International Conference on Learning Representations (ICLR)}},
  year={2026}
}

About

Discrete Diffusion Divergence Instruct (DiDi-Instruct)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published