GitHub - Algolzw/self-rewarding-smc: Self-Rewarding Sequential Monte Carlo for Masked Diffusion Language Models

Self-Rewarding SMC for Masked Diffusion Language Models
_{Official PyTorch Implementation}

Self-Rewarding Sequential Monte Carlo (SMC) for Masked Diffusion Language Models

Overview

TL;DR. Self-Rewarding SMC is an inference-time scaling method that leverages trajectory-level confidence from diffusion models as importance weights to steer generation toward globally confident, high-quality samples.

Self-Rewarding SMC is reward-free and thus can be applied to arbitrary pretrained models and tasks!

In this repository, we provide evaluations on standard Masked Diffusion Language Models (MDLMs) including MDLM and BD3-LMs, and diffusion large language models (dLLMs) including LLaDA-1.5 and Dream-7B.

Code Structure

.
├── 🚀bd3lms/          		   # Experiments on MDLM and BD3-LMs
│   ├── configs/		 		# Model and inference configurations
│	     └── config.yaml   		# main configurations, updated for [SMC]
│   ├── scripts/       	   		# sampling scripts 
│   └── diffusion.py     		# core algorithm, updated for [SMC]
│
├── 🚀llada/           		   # Experiments on LLaDA-1.5
│   ├── eval_llada.py			# Main evaluation code for LLaDA-1.5
│   ├── generate_smc.py			# Self-rewarding [SMC] implementation
│   └── eval.md         		# Evaluation instructions for LLaDA
│
├── 🚀dream/           		   # Experiments on Dream-7B
│   ├── model/
│		 └── generation_utils_smc_block.py 
								# Self-rewarding [SMC] implementation
│   ├── eval.py					# Main evaluation code for Dream-7B
│   └── eval.md					# Evaluation instructions for Dream
│
├── images/          			# Figures used in README
│
├── README.md        			# Project overview and instructions
└── LICENSE

Each subdirectory contains model-specific configurations and scripts for running self-rewarding SMC at inference time.

As an example, run the following code for sample quality evaluation:

cd bd3lms
sh scripts/gen_ppl/genppls_batch.sh

Evaluation on MDLMs

Self-rewarding SMC improves the generative perplexity:

Evaluation on dLLMs

1. Self-rewarding SMC improves dLLMs in math and coding:

2. Overall performance trends as #particle increases:

3. Effect of Gumbel noise temperature on model performance:

Contributing

Issues and Pull Requests are welcome!

Acknowledgements

We would like to thank the authors of Fast-dLLM and bd3lms for their excellent work and open-source contributions.

Citation

If you find this work useful, please cite our paper:

@article{luo2026self,
  title={Self-Rewarding Sequential Monte Carlo for Masked Diffusion Language Models},
  author={Luo, Ziwei and Jin, Ziqi and Wang, Lei and Bing, Lidong and Sch{\"o}n, Thomas B},
  journal={arXiv preprint arXiv:2602.01849},
  year={2026}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Self-Rewarding SMC for Masked Diffusion Language Models
_{Official PyTorch Implementation}

Overview

Code Structure

Evaluation on MDLMs

Evaluation on dLLMs

Contributing

Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
bd3lms		bd3lms
dream		dream
images		images
llada		llada
.gitignore		.gitignore
LICENSE		LICENSE
readme.md		readme.md

License

Algolzw/self-rewarding-smc

Folders and files

Latest commit

History

Repository files navigation

Self-Rewarding SMC for Masked Diffusion Language ModelsOfficial PyTorch Implementation

Overview

Code Structure

Evaluation on MDLMs

Evaluation on dLLMs

Contributing

Acknowledgements

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Self-Rewarding SMC for Masked Diffusion Language Models
_{Official PyTorch Implementation}

Packages