Skip to content
/ rho-EOS Public

Official Repository of "ρ-π™΄π™Ύπš‚: Training-free Bidirectional Variable-Length Control for Masked Diffusion LLMs"

License

Notifications You must be signed in to change notification settings

yjyddq/rho-EOS

Repository files navigation

⍴-EOS: Training-free Bidirectional Variable-Length
Control for Masked Diffusion LLMs


1 Shanghai AI Laboratory Β Β  2 Fudan University

Paper License Stars Issues Closed Issues


πŸ’‘ ⍴-EOS is a training-free and single-stage strategy for bidirectional variable-length control via implicit EOS density (⍴) that unlocks dynamic and variable-length generation for Masked dLLMs (e.g., LLaDA), achieving performance comparable to, and sometimes superior to, meticulously tuned fixed-length baselines.

πŸ“– Click for the full abstract of ⍴-EOS

Beyond parallel generation and global context modeling, current masked diffusion large language models (dLLMs) suffer from a fundamental limitation: they require a predefined, fixed generation length, which lacks flexibility and forces an inevitable trade-off between output quality and computational efficiency. To address this, we study the denoising dynamics and find that the implicit density ($\rho$) of end-of-sequence ($\texttt{EOS}$) tokens serves as a reliable signal of generation sufficiency.

In particular, the evolving implicit $\texttt{EOS}$ density during denoising reveals whether the current masked space is excessive or insufficient, thereby guiding the adjustment direction for generation length. Building on this insight, we propose $\rho$-$\texttt{EOS}$, a training-free, single-stage strategy that enables bidirectional variable-length generation for masked dLLMs.

Unlike prior two-stage approachesβ€”which require separate length adjustment and iterative mask insertion phases while supporting only unidirectional expansionβ€”$\rho$-$\texttt{EOS}$ achieves bidirectional length adjustment within a unified denoising process by continuously estimating the implicit $\texttt{EOS}$ density: excessively high density triggers $\texttt{MASK}$ token contraction, while insufficient density induces expansion.

Extensive experiments on mathematics and code benchmarks demonstrate that $\rho$-$\texttt{EOS}$ achieves comparable performance while substantially improving inference efficiency and token utilization.


πŸ“’ News

  • [2026/02/07] We released our code!
  • [2026/01/30] We released our paper on arXiv!

πŸ’» Overview


Left (Standard & DAEDAL): Standard denoising requires a fixed generation length, lacking flexibility and forcing an inevitable trade-off between performance and efficiency. DAEDAL uses a two-stage approachβ€”first adjusting length, then iteratively inserting masksβ€”but only supports unidirectional expansion.

Right (⍴-EOS): Our method performs denoising and length adjustment simultaneously within a unified loop. By monitoring the implicit EOS density (⍴EOS), it dynamically expands or contracts the generation length bidirectionally, achieving efficient and flexible variable-length generation.

πŸ“Š Evolution Trend of Implicit EOS Density


The calculation of implicit EOS density: During the denoising process, we calculate the implicit EOS density among the remaining masked positions: ⍴ = (Implicit EOS at step t) / (Remaining MASK at step t).

(a-b) The trend of implicit EOS density during the denoising process: The figures show how ⍴EOS evolves during generation on the GSM8K and MBPP benchmarks. Different colors represent different generation lengths. The implicit EOS density gradually converges as denoising progresses, providing a reliable signal for determining when the generation is sufficient.

πŸ“ˆ Variable-Length Generation Results


The table compares ⍴-EOS with DAEDAL across various initial generation lengths (128, 256, 512, 1024) on both LLaDA-8B and LLaDA-1.5 models. We evaluate on four benchmarks: GSM8K, MATH-500, MBPP, and HumanEval.

Key findings:

  • Robustness to initial length: ⍴-EOS maintains stable performance regardless of the initial generation length
  • Better token efficiency and speedup evaluation: ⍴-EOS achieves higher effective ratios, and reduce the runtime of evaluation
  • ⍴-EOS consistently outperforms DAEDAL across most settings, demonstrating superior accuracy (Acc) and effective token ratio (Eratio). ⍴-EOS achieves the best trade-off between generation quality and efficiency across all benchmarks

⚑ Token Utilization

We compare the generation efficiency of ⍴-EOS against DAEDAL and Baseline methods across four benchmarks. The Effective Ratio measures how efficiently tokens are utilized (non-padding tokens / total tokens).

LLaDA-8B Results

GSM8K MATH-500 MBPP HumanEval
⍴-EOS 84.5% 86.1% 65.2% 87.8%
DAEDAL 74.1% 73.9% 55.0% 64.2%
Baseline 27.5% 57.1% 16.3% 32.7%

LLaDA-1.5 Results

GSM8K MATH-500 MBPP HumanEval
⍴-EOS 83.5% 89.4% 64.1% 84.7%
DAEDAL 72.8% 73.0% 54.4% 63.8%
Baseline 14.3% 56.9% 33.5% 65.6%

Observations:

  • ⍴-EOS (Red) consistently achieves the highest effective token ratio across all benchmarks, a substantial portion of samples generated by ρ-EOS achieve an effective token ratio close to 100%
  • DAEDAL (Blue) shows moderate token utilization improvement over baseline
  • Baseline (Orange) uses fixed-length generation with the lowest token utilization

πŸ› οΈ Installation and Setup

Repository and Environment Setup

git clone https://github.com/yjyddq/rho-EOS.git
cd rho-EOS

conda create -n rho-EOS python=3.10
conda activate rho-EOS

pip install -r requirements.txt

Model Setup

After downloading LLaDA-8B-Instruct and LLaDA-1.5, replace the MODEL_PATH in scripts/*.sh with your local path.

🎈 Quick Start

Evaluate ⍴-EOS

# default configuration
sh scripts/eval_LLaDA_rho_EOS.sh
# specify RHO_LOW | RHO_HIGH and SCHEDULER
sh scripts/eval_LLaDA_rho_EOS.sh --rho_low 0.3 --rho_high 0.7 --scheduler exp

Evaluate DAEDAL

sh scripts/eval_LLaDA_DAEDAL.sh

Evaluate Baseline

sh scripts/eval_LLaDA_Baseline.sh

πŸ”— Citation

If you find our work helpful, please consider giving a star ⭐ and citation πŸ“

@article{yang2026rho,
  title={$\rho$-$\texttt{EOS}$: Training-free Bidirectional Variable-Length Control for Masked Diffusion LLMs},
  author={Yang, Jingyi and Jiang, Yuxian and Shao, Jing},
  journal={arXiv preprint arXiv:2601.22527},
  year={2026}
}

πŸ™ Acknowledgements

This code is built upon the following repositories. Sincere thanks to the authors for their wonderful work.

About

Official Repository of "ρ-π™΄π™Ύπš‚: Training-free Bidirectional Variable-Length Control for Masked Diffusion LLMs"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published