Skip to content

add rmsnorm CK_TILE_FLOAT_TO_BFLOAT16_DEFAULT compile config#1978

Open
zhyajie wants to merge 1 commit intodev/perffrom
zyj_dev
Open

add rmsnorm CK_TILE_FLOAT_TO_BFLOAT16_DEFAULT compile config#1978
zhyajie wants to merge 1 commit intodev/perffrom
zyj_dev

Conversation

@zhyajie
Copy link
Contributor

@zhyajie zhyajie commented Feb 5, 2026

Summary

Add compile-time configuration for FP32 to BF16 rounding mode in RMSNorm CK kernel.

This PR allows users to control the FP32 to BF16 conversion behavior in CK RMSNorm kernel via the CK_TILE_FLOAT_TO_BFLOAT16_DEFAULT environment variable.

Changes

  • Add -DCK_TILE_FLOAT_TO_BFLOAT16_DEFAULT compile flag to module_rmsnorm in optCompilerConfig.json
  • Default value is 2 (Truncate mode, same as CK default)

Rounding Mode Options

Value Mode Description
0 STANDARD RNE (Round to Nearest Even) - software implementation
1 TRUNCATE_WITH_NAN Truncate with NaN preservation
2 TRUNCATE Fast truncate (default)
3 STANDARD_ASM RNE - optimized asm implementation
4 RTA_ASM Round to Nearest Away - asm implementation

Usage

To use RNE (Round to Nearest Even) rounding mode:

export CK_TILE_FLOAT_TO_BFLOAT16_DEFAULT=3
# rebuild aiter

Motivation

Different rounding modes may affect numerical precision in model inference. RNE is the IEEE 754 default and provides better numerical stability for precision-sensitive workloads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant