ClippedScion

Code accompanying the paper Generalized Gradient Norm Clipping & Non-Euclidean $(L_0,L_1)$-Smoothness.

This paper is a following work of Training Deep Learning Models with Norm-Constrained LMOs and is based on the Scion codebase.

Repository structure

clippedscion.py: Contains the UnconstrainedClippedScion and ClippedScion reference implementation along with various norm choices.
- Algorithm 3 corresponds to UnconstrainedClippedScion.
- Algorithm 4 (Variant 2) corresponds to ClippedScion. For simplicity, we control $\min{\rho, \sum_{l=1}^D \braket{d^k_l,v^k_l}}$ in practice.
examples/: Example usage containing airbench, nanoGPT, and DeiT experiments with and without weight sharing.

Notes

The ClippedScion optimizer comes with a couple of hyperparameters:

momentum: The parameter is 1-usual_momentum of e.g. the PyTorch implementation of SGD with momentum. A good default is 0.1. Higher values seem to work better (e.g. 0.5) for short training runs with low noise as also supported by theory.
scale: Controls the per-layer constraint radius factor. The layerwise radius can be tuned on a small proxy model similarly to the input and output scaling factor of µP.
lr: The learning rate can similarly be tuned on a small proxy model (corresponds to γ in the paper).
unconstrained: When set to False the constrained variant of the ClippedScion is used, which guarantees the iterates to stay bounded.
rho: Clipping threshold controls $\sum_{l=1}^D \braket{d^k_l,v^k_l}$ in Algorithm 3 & 4.

Architectural changes:

Scale activation functions (ReLU, GELU) by √2 to maintain the input variance.

Examples

For runnable examples see examples/. Below are some pseudocode configurations for different architectures and domains (see Appendix C for exact parameter choices):

nanoGPT with weight sharing (see examples/modded-nanogpt):

radius = 50.0
threshold = 600
optim_groups = [{
    'params': model.transformer.h.parameters(),
    'norm': 'Spectral',
    'norm_kwargs': {},
    'scale': radius,
}, {
    'params': model.lm_head.parameters(),
    'norm': 'Sign',
    'norm_kwargs': {},
    'scale': radius*60.0,
}]
optimizer = UnconstrainedClippedScion(optim_groups, lr=2**-12, momentum=0.1, rho=600)

CNN (see examples/airbench for further details):

radius = 8.0
threshold = 1600
optim_groups = [{
    'params': remaining_parameters,
    'norm': 'Auto', # Picks layerwise norm based on the parameter shape
    'norm_kwargs': {},
    'scale': radius,
}, {
    'params': output_layer,
    'norm': 'Sign',
    'norm_kwargs': {'normalized': True},
    'scale': radius*16,
}]
optimizer = UnconstrainedClippedScion(optim_groups, lr=2**-4, momentum=0.5, rho=1600)

DeiT

radius = 25
threshold = 8000
optim_groups = [{
    'params': other_params,
    'norm': 'Auto',
    'norm_kwargs': {},
    'scale': radius,
},{
    'params': head_weights,
    'norm': 'Sign',
    'norm_kwargs': {},
    'scale': radius*20,
},{
    'params': [pos_embed_param, cls_token_param],
    'norm': 'BiasRMS',
    'norm_kwargs': {},
    'scale': radius,
}]
optimizer = UnconstrainedClippedScion(optim_groups, lr=8e-5, momentum=0.1, rho=8000)

Citation

If you find this work useful, please cite it as follows:

@article{pethick2025generalized,
  title={Generalized Gradient Norm Clipping \& Non-Euclidean $(L\_0, L\_1) $-Smoothness},
  author={Pethick, Thomas and Xie, Wanyun and Erdogan, Mete and Antonakopoulos, Kimon and Silveti-Falls, Tony and Cevher, Volkan},
  journal={arXiv preprint arXiv:2506.01913},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
examples		examples
LICENSE		LICENSE
README.md		README.md
clippedscion.py		clippedscion.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ClippedScion

Repository structure

Notes

Examples

Citation

About

Uh oh!

Releases

Packages

Languages

License

LIONS-EPFL/ClippedScion

Folders and files

Latest commit

History

Repository files navigation

ClippedScion

Repository structure

Notes

Examples

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages