Adding support for bias addition + rescaling with token weights to grouped_gemm by metastableB · Pull Request #5280 · pytorch/FBGEMM

metastableB · 2025-12-29T19:52:10Z

Summary:
Adds support for providing bias and token weights as optional arguments to fbgemm's triton grouped gemm. The changes were added to the _grouped_gemm protected kernel implementation, and exposed through a new public function grouped_gemm_bias_scale --- the original grouped_gemm signature remains untouched.

For internal testing use,

buck test -c fbcode.nvcc_arch=h100a -c fbcode.enable_gpu_sections=true fbcode//deeplearning/fbgemm/fbgemm_gpu/experimental/gemm/test:grouped_gemm_test -- test_grouped_gemm_bias_scale

For basic benchmarking on H100 nodes use,

 buck run -c fbcode.nvcc_arch=h100a -c fbcode.enable_gpu_sections=true fbcode//deeplearning/fbgemm/fbgemm_gpu/experimental/gemm/test:grouped_gemm_bias_scale_benchmark 2>/dev/null

Benchmark Results:
Config fused (ms) triton+torch (ms) torch (ms) Speedup vs torch Speedup vs triton+torch
 Small      0.009             0.017      0.027            3.03x                   1.91x
Medium      0.017             0.036      0.049            2.82x                   2.04x
 Large      0.048             0.091      0.142            2.97x                   1.91x

Differential Revision: D89699751

…ouped_gemm Summary: Adds support for providing bias and token weights as optional arguments to fbgemm's triton grouped gemm. The changes were added to the `_grouped_gemm` protected kernel implementation, and exposed through a new public function `grouped_gemm_bias_scale` --- the original `grouped_gemm` signature remains untouched. For internal testing use, ``` buck test -c fbcode.nvcc_arch=h100a -c fbcode.enable_gpu_sections=true fbcode//deeplearning/fbgemm/fbgemm_gpu/experimental/gemm/test:grouped_gemm_test -- test_grouped_gemm_bias_scale ``` For basic benchmarking on H100 nodes use, ``` buck run -c fbcode.nvcc_arch=h100a -c fbcode.enable_gpu_sections=true fbcode//deeplearning/fbgemm/fbgemm_gpu/experimental/gemm/test:grouped_gemm_bias_scale_benchmark 2>/dev/null ``` ``` Benchmark Results: Config fused (ms) triton+torch (ms) torch (ms) Speedup vs torch Speedup vs triton+torch Small 0.009 0.017 0.027 3.03x 1.91x Medium 0.017 0.036 0.049 2.82x 2.04x Large 0.048 0.091 0.142 2.97x 1.91x ``` Differential Revision: D89699751

meta-cla bot added the cla signed label Dec 29, 2025

meta-codesync bot added fb-exported meta-exported labels Dec 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding support for bias addition + rescaling with token weights to grouped_gemm#5280

Adding support for bias addition + rescaling with token weights to grouped_gemm#5280
metastableB wants to merge 1 commit intopytorch:mainfrom
metastableB:export-D89699751

metastableB commented Dec 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

metastableB commented Dec 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant