[AMD] MORI-EP inter kernel type switch by Duyi-Wang · Pull Request #18437 · sgl-project/sglang

Duyi-Wang · 2026-02-08T04:40:45Z

Motivation

MORI has recently implemented a new low-latency inter-node kernel, InterNodeV1LL, which delivers better performance than the existing InterNodeV1 when the number of tokens per rank is below 256.

Modifications

Since MORI does not yet support runtime automatic kernel switching, the inter-kernel is selected at initialization time based on SGLANG_MORI_NUM_MAX_DISPATCH_TOKENS_PER_RANK. The switching threshold is controlled by SGLANG_MORI_DISPATCH_INTER_KERNEL_SWITCH_THRESHOLD.

Accuracy Tests

Benchmarking and Profiling

For Token=128, dispatch fp8, combine bf16
Dispatch avg 1.52x, Combine avg 1.82x previous version
Before:

After:

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

gemini-code-assist · 2026-02-08T04:41:01Z

Summary of Changes

Hello @Duyi-Wang, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the MORI-EP dispatch system by implementing a configurable mechanism to select the optimal inter-node kernel. It allows the system to switch between InterNodeV1 and the more performant InterNodeV1LL kernel based on the number of dispatch tokens per rank, thereby optimizing performance for varying workloads. This selection is controlled by a new environment variable, and the underlying MORI library dependency has also been updated.

Highlights

MORI Dependency Update: The MORI dependency has been updated to a newer commit (20920706a9004018dbd87c7387f207d08d0e05af) in the Dockerfile.
Inter-Kernel Type Switching: Introduced logic to dynamically switch between InterNodeV1 and InterNodeV1LL kernel types for inter-node dispatch based on the number of tokens per rank. InterNodeV1LL is preferred for lower token counts (below 256) for better performance.
New Environment Variable: A new environment variable, SGLANG_MORI_DISPATCH_INTER_KERNEL_SWITCH_THRESHOLD, has been added to control the threshold for this kernel switching, with a default value of 256.
Documentation Update: The new environment variable and its purpose have been documented in docs/references/environment_variables.md.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

docker/rocm.Dockerfile
- Updated the MORI_COMMIT hash to a newer version (from b0dce4beebeb1f26c784eee17d5fd9785ee9447f to 20920706a9004018dbd87c7387f207d08d0e05af).
docs/references/environment_variables.md
- Added documentation for the new environment variable SGLANG_MORI_DISPATCH_INTER_KERNEL_SWITCH_THRESHOLD, explaining its role in kernel type selection and its default value.
python/sglang/srt/layers/moe/token_dispatcher/moriep.py
- Modified the get_ep_dispatch_configs function to accept num_max_dispatch_tokens_per_rank as an argument.
- Implemented logic within get_ep_dispatch_configs to read SGLANG_MORI_DISPATCH_INTER_KERNEL_SWITCH_THRESHOLD and determine the appropriate inter-node kernel type (InterNodeV1 or InterNodeV1LL) based on a comparison with num_max_dispatch_tokens_per_rank.
- Updated the init_mori_op function to pass num_max_dispatch_tokens_per_rank to get_ep_dispatch_configs.

Activity

The pull request was initiated by Duyi-Wang.
It introduces a new feature for MORI-EP to dynamically switch between inter-node kernel types for performance optimization.
A new environment variable has been introduced and documented to control this kernel switching behavior.
The underlying MORI library dependency has been updated.
The author has provided a clear motivation for the changes, highlighting the performance benefits of the new InterNodeV1LL kernel for specific token counts.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a mechanism to switch between two MORI inter-node kernels, InterNodeV1 and InterNodeV1LL, based on the number of tokens per rank. This is a valuable optimization to leverage the new low-latency kernel for smaller workloads. The implementation correctly uses environment variables to control this behavior. My feedback includes suggestions to improve the clarity of the documentation and code comments for the new configuration options, and to fix a minor typo for better code quality.

docs/references/environment_variables.md

python/sglang/srt/layers/moe/token_dispatcher/moriep.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

…inter_kernel_switch

Duyi-Wang added 3 commits February 8, 2026 12:23

Add mori inter kernel auto detected.

b480263

update mori

0c8f98d

update doc

feff5d3

Duyi-Wang requested review from BBuf, Edwardf0t1, Fridge003, HaiShaw, Ying1123, ch-wan, ishandhanani, ispobock and merrymercy as code owners February 8, 2026 04:40

github-actions bot added documentation Improvements or additions to documentation amd labels Feb 8, 2026

gemini-code-assist bot reviewed Feb 8, 2026

View reviewed changes

docs/references/environment_variables.md Outdated Show resolved Hide resolved

python/sglang/srt/layers/moe/token_dispatcher/moriep.py Outdated Show resolved Hide resolved

Duyi-Wang and others added 3 commits February 8, 2026 12:44

Update docs/references/environment_variables.md

22c3f5d

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update python/sglang/srt/layers/moe/token_dispatcher/moriep.py

c120296

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Merge commit '4f7422f7bada1f1d83b2cadc3ae3e609820f0e41' into mori_ep_…

ce376b9

…inter_kernel_switch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD] MORI-EP inter kernel type switch#18437

[AMD] MORI-EP inter kernel type switch#18437
Duyi-Wang wants to merge 6 commits intosgl-project:mainfrom
Duyi-Wang:mori_ep_inter_kernel_switch

Duyi-Wang commented Feb 8, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Feb 8, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Duyi-Wang commented Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

gemini-code-assist bot commented Feb 8, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Duyi-Wang commented Feb 8, 2026 •

edited

Loading