optimī

Fast, Modern, and Low Precision PyTorch Optimizers

optimi enables accurate low precision training via Kahan summation, supports fully decoupled weight decay, and features fast implementations of modern optimizers.

Low Precision Training with Kahan Summation

optimi optimizers can match the performance of mixed precision when training in BFloat16 by using Kahan summation.

Training in BFloat16 with Kahan summation can reduce non-activation training memory usage by 37.5 to 45.5 percent when using an Adam optimizer. BFloat16 training increases single GPU training speed by ~10 percent at the same batch size.

Fully Decoupled Weight Decay

In addition to supporting PyTorch-style decoupled weight decay, optimi optimizers also support fully decoupled weight decay.

Fully decoupled weight decay decouples weight decay from the learning rate, more accurately following Decoupled Weight Decay Regularization. This can help simplify hyperparameter tuning as the optimal weight decay is no longer tied to the learning rate.

Foreach Implementations

All optimi optimizers have fast foreach implementations, which can significantly outperform the for-loop versions. optimi reuses the gradient buffer for temporary variables to reduce foreach memory usage.

Documentation

https://optimi.benjaminwarner.dev

Install

optimi is available to install from pypi.

pip install torch-optimi

Optimizers

optimi implements the following optimizers:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.1.1: Initial Release

Choose a tag to compare

Sorry, something went wrong.