From black-box heuristics to white-box diagnostics for RLVR training collapse
Implementation patch for GradLoc, built on top of a fixed verl commit.
This repository implements the GradLoc part from our blog on RLVR training collapse diagnosis and stabilization.
The current release focuses on the GradLoc demo patch:
- GradLoc: localizes gradient spikes to exact culprit tokens with distributed binary search (
O(log N)).
Figure 2. GradLoc localization path: global -> micro-batch -> rank -> token, with adaptive thresholds.
This repo is intentionally lightweight and patch-oriented, so you can directly apply changes to upstream verl and reproduce experiments.
We plan to further package GradLoc as a cleaner, configurable feature with better veRL integration and upstream-merge readiness in future releases.
The following arguments in run_experiment.sh are the core runtime knobs for GradLoc.
They control trigger sensitivity, search budget, and dump path.
actor_rollout_ref.actor.grad_norm_threshold=640.0 \ # Spike trigger threshold for token-level grad norm
actor_rollout_ref.actor.bisect_budget_steps=128 \ # Max binary-search budget (forward/backward probes)
actor_rollout_ref.actor.bisect_dump_dir="${CKPTS_DIR}/bisect_dump" \ # Output dir for localization artifacts- Upstream:
verl - Commit:
f9c855f7cf04d603c9546bc01776c74806a879c1
verl/trainer/ppo/ray_trainer.pyverl/utils/reward_score/__init__.pyverl/utils/reward_score/math_verify.pyverl/workers/actor/dp_actor.py
- Clone upstream
verland checkout the base commit:git clone https://github.com/volcengine/verl.gitcd verl && git checkout f9c855f7cf04d603c9546bc01776c74806a879c1
- Apply patch from URL:
python /path/to/GradLoc-Patch/apply_patch.py --repo /path/to/verl --patch-url <PATCH_URL> --sha256-file <SHA256_URL>
If patches/gradloc.patch is already available locally:
python /path/to/GradLoc-Patch/apply_patch.py --repo /path/to/verl --patch-file /path/to/GradLoc-Patch/patches/gradloc.patch
bash /path/to/GradLoc-Patch/run_experiment.sh
When code is modified on top of the base commit, regenerate the patch with:
bash /path/to/GradLoc-Patch/make_patch.sh --repo /path/to/verl
This rewrites patches/gradloc.patch from:
git diff <base_commit> <current_head>
- Guanhua Huang:
carlan0974@gmail.com - Tingqiang Xu:
xtq23@mails.tsinghua.edu.cn - Jinbo Wang:
wangjinbo@stu.pku.edu.cn(wangjinbo@ustc.edufor long-term contact)
If you find this project useful, please cite:
@misc{huang-xu-wang-2026-gradloc,
title = {Stabilizing RLVR via Token-level Gradient Diagnosis and Layerwise Clipping},
author = {Huang, Guanhua and Xu, Tingqiang and Wang, Jinbo and Sheng, Guangming and Li, Siheng and Yang, Evander and Li, Kejiao and Li, Yunxiang and Xu, Zenan and Yi, Qi and Gong, Xue and Nan, Ziyuan and Jiang, Yuhao and Zhang, Chenchen and Wu, Taiqiang and Zhang, Feiyuan and Wang, Junhao and Zhou, Bo and Chen, Alex and Wang, Di and Yao, Shunyu},
year = {2026},
url = {https://hy.tencent.com/research/100015}
}