π‘ β΄-EOS is a training-free and single-stage strategy for bidirectional variable-length control via implicit EOS density (β΄) that unlocks dynamic and variable-length generation for Masked dLLMs (e.g., LLaDA), achieving performance comparable to, and sometimes superior to, meticulously tuned fixed-length baselines.
π Click for the full abstract of β΄-EOS
Beyond parallel generation and global context modeling, current masked diffusion large language models (dLLMs) suffer from a fundamental limitation: they require a predefined, fixed generation length, which lacks flexibility and forces an inevitable trade-off between output quality and computational efficiency. To address this, we study the denoising dynamics and find that the implicit density (
$\rho$ ) of end-of-sequence ($\texttt{EOS}$ ) tokens serves as a reliable signal of generation sufficiency.In particular, the evolving implicit
$\texttt{EOS}$ density during denoising reveals whether the current masked space is excessive or insufficient, thereby guiding the adjustment direction for generation length. Building on this insight, we propose $\rho$-$\texttt{EOS}$, a training-free, single-stage strategy that enables bidirectional variable-length generation for masked dLLMs.Unlike prior two-stage approachesβwhich require separate length adjustment and iterative mask insertion phases while supporting only unidirectional expansionβ$\rho$-$\texttt{EOS}$ achieves bidirectional length adjustment within a unified denoising process by continuously estimating the implicit
$\texttt{EOS}$ density: excessively high density triggers$\texttt{MASK}$ token contraction, while insufficient density induces expansion.Extensive experiments on mathematics and code benchmarks demonstrate that $\rho$-$\texttt{EOS}$ achieves comparable performance while substantially improving inference efficiency and token utilization.
- [2026/02/07] We released our code!
- [2026/01/30] We released our paper on arXiv!
Left (Standard & DAEDAL): Standard denoising requires a fixed generation length, lacking flexibility and forcing an inevitable trade-off between performance and efficiency. DAEDAL uses a two-stage approachβfirst adjusting length, then iteratively inserting masksβbut only supports unidirectional expansion.
Right (β΄-EOS): Our method performs denoising and length adjustment simultaneously within a unified loop. By monitoring the implicit EOS density (β΄EOS), it dynamically expands or contracts the generation length bidirectionally, achieving efficient and flexible variable-length generation.
The calculation of implicit EOS density: During the denoising process, we calculate the implicit EOS density among the remaining masked positions: β΄ = (Implicit EOS at step t) / (Remaining MASK at step t).
(a-b) The trend of implicit EOS density during the denoising process: The figures show how β΄EOS evolves during generation on the GSM8K and MBPP benchmarks. Different colors represent different generation lengths. The implicit EOS density gradually converges as denoising progresses, providing a reliable signal for determining when the generation is sufficient.
The table compares β΄-EOS with DAEDAL across various initial generation lengths (128, 256, 512, 1024) on both LLaDA-8B and LLaDA-1.5 models. We evaluate on four benchmarks: GSM8K, MATH-500, MBPP, and HumanEval.
Key findings:
- Robustness to initial length: β΄-EOS maintains stable performance regardless of the initial generation length
- Better token efficiency and speedup evaluation: β΄-EOS achieves higher effective ratios, and reduce the runtime of evaluation
- β΄-EOS consistently outperforms DAEDAL across most settings, demonstrating superior accuracy (Acc) and effective token ratio (Eratio). β΄-EOS achieves the best trade-off between generation quality and efficiency across all benchmarks
We compare the generation efficiency of β΄-EOS against DAEDAL and Baseline methods across four benchmarks. The Effective Ratio measures how efficiently tokens are utilized (non-padding tokens / total tokens).
| GSM8K | MATH-500 | MBPP | HumanEval | |
|---|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
|
| β΄-EOS | 84.5% | 86.1% | 65.2% | 87.8% |
| DAEDAL | 74.1% | 73.9% | 55.0% | 64.2% |
| Baseline | 27.5% | 57.1% | 16.3% | 32.7% |
| GSM8K | MATH-500 | MBPP | HumanEval | |
|---|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
|
| β΄-EOS | 83.5% | 89.4% | 64.1% | 84.7% |
| DAEDAL | 72.8% | 73.0% | 54.4% | 63.8% |
| Baseline | 14.3% | 56.9% | 33.5% | 65.6% |
Observations:
- β΄-EOS (Red) consistently achieves the highest effective token ratio across all benchmarks, a substantial portion of samples generated by Ο-EOS achieve an effective token ratio close to 100%
- DAEDAL (Blue) shows moderate token utilization improvement over baseline
- Baseline (Orange) uses fixed-length generation with the lowest token utilization
git clone https://github.com/yjyddq/rho-EOS.git
cd rho-EOS
conda create -n rho-EOS python=3.10
conda activate rho-EOS
pip install -r requirements.txtAfter downloading LLaDA-8B-Instruct and LLaDA-1.5, replace the MODEL_PATH in scripts/*.sh with your local path.
# default configuration
sh scripts/eval_LLaDA_rho_EOS.sh
# specify RHO_LOW | RHO_HIGH and SCHEDULER
sh scripts/eval_LLaDA_rho_EOS.sh --rho_low 0.3 --rho_high 0.7 --scheduler expsh scripts/eval_LLaDA_DAEDAL.shsh scripts/eval_LLaDA_Baseline.shIf you find our work helpful, please consider giving a star β and citation π
@article{yang2026rho,
title={$\rho$-$\texttt{EOS}$: Training-free Bidirectional Variable-Length Control for Masked Diffusion LLMs},
author={Yang, Jingyi and Jiang, Yuxian and Shao, Jing},
journal={arXiv preprint arXiv:2601.22527},
year={2026}
}This code is built upon the following repositories. Sincere thanks to the authors for their wonderful work.












