Skip to content

v0.3.0: Fast Triton Kernels

Choose a tag to compare

@warner-benjamin warner-benjamin released this 15 Jul 20:36
· 18 commits to main since this release
e155965

This release adds Triton kernels for all optimi optimizers and set's them as the default. optimi's vertically fused Triton kernels are faster than PyTorch's vertically and horizontally fused Cuda kernels and are nearly as fast as compiled optimizers.

Full Changelog: v0.2.1...v0.3.0