Use torch._grouped_mm in eager mode by wujingyue · Pull Request #2721 · Lightning-AI/lightning-thunder

wujingyue · 2025-11-06T08:05:25Z

This gives a fair comparison between eager and other modes.

The constraints mentioned in the comment seem to have been fixed by pytorch/pytorch#161407

python thunder/benchmarks/benchmark_inference.py at head runs fine on both Blackwell and Ampere.

This gives a fair comparison between eager and other modes. The constraints mentioned in the comment seem to have been fixed at least for Blackwell.

Copilot

Pull Request Overview

This PR enables the use of torch._grouped_mm in eager mode for benchmarking purposes, providing a fair comparison between eager and other modes. Previously, the function was only used during compilation (via torch.compiler.is_compiling() check). The constraints that prevented eager mode usage have been resolved.

Key changes:

Replaced torch.compiler.is_compiling() check with availability check based on _grouped_mm variable
Added else clause to set _grouped_mm = None for torch versions < 2.8.0
Removed outdated comment about constraints requiring offsets to be multiples of 16

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

thunder/benchmarks/layers_for_inference_benchmark.py

Co-authored-by: Masaki <mkozuki@nvidia.com>

crcrpar

looks good to me

mattteochen

Thanks!

Lightning-AI/lightning-thunder#2721

Use torch._grouped_mm in eager mode

ff9c62e

This gives a fair comparison between eager and other modes. The constraints mentioned in the comment seem to have been fixed at least for Blackwell.

wujingyue requested review from KaelanDt, lantiga, mruberry and t-vi as code owners November 6, 2025 08:05

wujingyue requested review from crcrpar and mattteochen November 6, 2025 08:05

crcrpar requested a review from Copilot November 11, 2025 04:50

Copilot started reviewing on behalf of crcrpar November 11, 2025 04:50 View session

Copilot finished reviewing on behalf of crcrpar November 11, 2025 04:51

Copilot AI reviewed Nov 11, 2025

View reviewed changes

crcrpar approved these changes Nov 11, 2025

View reviewed changes

thunder/benchmarks/layers_for_inference_benchmark.py Outdated Show resolved Hide resolved

Update layers_for_inference_benchmark.py

cbd61d9

Co-authored-by: Masaki <mkozuki@nvidia.com>

crcrpar approved these changes Nov 11, 2025

View reviewed changes

mattteochen approved these changes Nov 11, 2025

View reviewed changes

wujingyue enabled auto-merge (squash) November 11, 2025 15:58

tbqh added a commit to NVIDIA/Fuser that referenced this pull request Nov 21, 2025

Pull thunder PR "Use torch._grouped_mm in eager mode"

d8f6f57

Lightning-AI/lightning-thunder#2721

wujingyue mentioned this pull request Nov 21, 2025

Pull the Llama4 inference benchmark from lightning-thunder NVIDIA/Fuser#5578

Merged

tbqh added a commit to NVIDIA/Fuser that referenced this pull request Nov 21, 2025

Pull thunder PR "Use torch._grouped_mm in eager mode"

e181595

Lightning-AI/lightning-thunder#2721

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use torch._grouped_mm in eager mode#2721

Use torch._grouped_mm in eager mode#2721
wujingyue wants to merge 2 commits intomainfrom
wjy/gmm

wujingyue commented Nov 6, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

crcrpar left a comment

Uh oh!

mattteochen left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

wujingyue commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

crcrpar left a comment

Choose a reason for hiding this comment

Uh oh!

mattteochen left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wujingyue commented Nov 6, 2025 •

edited

Loading