Refactor optimizer implementations and improve multi_tensor ops #36

lihongyang1990 · 2026-02-11T06:25:38Z

Summary

Refactor and improve the FlagOS optimizer and multi_tensor implementations to better match CUDA behavior and improve code quality.

Changes

`fused_adam.py` (FlagOS backend)

Remove unused inv_scale and out_dtype parameters from multi_tensor_adam_fl
multi_tensor_adam_param_remainder_fl: rewrite FP32 master weight reconstruction using bit manipulation (int16 high/low bits), matching the CUDA implementation exactly

`multi_tensor.py` (FlagOS backend)

multi_tensor_l2_norm_fl: add proper type hints, noop_flag check, inf/nan detection, and replace raw ** / + operators with flag_gems.mul / flag_gems.add
multi_tensor_scale_fl: add type hints, noop_flag check, inf/nan detection, and replace src * scale with flag_gems.mul(src, scale)

`optimizer.py` (reference backend)

Update multi_tensor_l2norm_torch and multi_tensor_adam_torch to match new signatures and CUDA behavior (L2 vs AdamW mode split)
Rewrite multi_tensor_adam_param_remainder_torch with bit manipulation matching CUDA
Rename eps → epsilon for consistency

`optimizers/init.py`

Export multi_tensor_scale and multi_tensor_l2norm

Misc

Fix missing newline at end of files

- Refactor multi_tensor_adam_fl: remove unused inv_scale and out_dtype params - Refactor multi_tensor_adam_param_remainder_fl: use bit manipulation for BF16/int16 FP32 reconstruction matching CUDA implementation exactly - Improve multi_tensor_l2_norm_fl and multi_tensor_scale_fl: add type hints, inf/nan checks, and replace raw operators with flag_gems.mul/flag_gems.add - Update reference optimizer impl to match new signatures and behavior - Export multi_tensor_scale and multi_tensor_l2norm in pytorch optimizers init - Fix missing newline at end of files

…d_adam.py

lihongyang1990 added 2 commits February 11, 2026 14:24

fix format : transformer_engine/plugin/core/backends/flagos/impl/fuse…

d5b0dbc

…d_adam.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor optimizer implementations and improve multi_tensor ops #36

Refactor optimizer implementations and improve multi_tensor ops #36

lihongyang1990 commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Refactor optimizer implementations and improve multi_tensor ops #36

Are you sure you want to change the base?

Refactor optimizer implementations and improve multi_tensor ops #36

Conversation

lihongyang1990 commented Feb 11, 2026

Summary

Changes

fused_adam.py (FlagOS backend)

multi_tensor.py (FlagOS backend)

optimizer.py (reference backend)

optimizers/__init__.py

Misc

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`fused_adam.py` (FlagOS backend)

`multi_tensor.py` (FlagOS backend)

`optimizer.py` (reference backend)

`optimizers/init.py`