Skip to content

[AMD] MORI-EP inter kernel type switch#18437

Open
Duyi-Wang wants to merge 6 commits intosgl-project:mainfrom
Duyi-Wang:mori_ep_inter_kernel_switch
Open

[AMD] MORI-EP inter kernel type switch#18437
Duyi-Wang wants to merge 6 commits intosgl-project:mainfrom
Duyi-Wang:mori_ep_inter_kernel_switch

Conversation

@Duyi-Wang
Copy link
Contributor

@Duyi-Wang Duyi-Wang commented Feb 8, 2026

Motivation

MORI has recently implemented a new low-latency inter-node kernel, InterNodeV1LL, which delivers better performance than the existing InterNodeV1 when the number of tokens per rank is below 256.

Modifications

Since MORI does not yet support runtime automatic kernel switching, the inter-kernel is selected at initialization time based on SGLANG_MORI_NUM_MAX_DISPATCH_TOKENS_PER_RANK. The switching threshold is controlled by SGLANG_MORI_DISPATCH_INTER_KERNEL_SWITCH_THRESHOLD.

Accuracy Tests

Benchmarking and Profiling

For Token=128, dispatch fp8, combine bf16
Dispatch avg 1.52x, Combine avg 1.82x previous version
Before:
image
image
After:
image
image

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

@github-actions github-actions bot added documentation Improvements or additions to documentation amd labels Feb 8, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Duyi-Wang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the MORI-EP dispatch system by implementing a configurable mechanism to select the optimal inter-node kernel. It allows the system to switch between InterNodeV1 and the more performant InterNodeV1LL kernel based on the number of dispatch tokens per rank, thereby optimizing performance for varying workloads. This selection is controlled by a new environment variable, and the underlying MORI library dependency has also been updated.

Highlights

  • MORI Dependency Update: The MORI dependency has been updated to a newer commit (20920706a9004018dbd87c7387f207d08d0e05af) in the Dockerfile.
  • Inter-Kernel Type Switching: Introduced logic to dynamically switch between InterNodeV1 and InterNodeV1LL kernel types for inter-node dispatch based on the number of tokens per rank. InterNodeV1LL is preferred for lower token counts (below 256) for better performance.
  • New Environment Variable: A new environment variable, SGLANG_MORI_DISPATCH_INTER_KERNEL_SWITCH_THRESHOLD, has been added to control the threshold for this kernel switching, with a default value of 256.
  • Documentation Update: The new environment variable and its purpose have been documented in docs/references/environment_variables.md.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • docker/rocm.Dockerfile
    • Updated the MORI_COMMIT hash to a newer version (from b0dce4beebeb1f26c784eee17d5fd9785ee9447f to 20920706a9004018dbd87c7387f207d08d0e05af).
  • docs/references/environment_variables.md
    • Added documentation for the new environment variable SGLANG_MORI_DISPATCH_INTER_KERNEL_SWITCH_THRESHOLD, explaining its role in kernel type selection and its default value.
  • python/sglang/srt/layers/moe/token_dispatcher/moriep.py
    • Modified the get_ep_dispatch_configs function to accept num_max_dispatch_tokens_per_rank as an argument.
    • Implemented logic within get_ep_dispatch_configs to read SGLANG_MORI_DISPATCH_INTER_KERNEL_SWITCH_THRESHOLD and determine the appropriate inter-node kernel type (InterNodeV1 or InterNodeV1LL) based on a comparison with num_max_dispatch_tokens_per_rank.
    • Updated the init_mori_op function to pass num_max_dispatch_tokens_per_rank to get_ep_dispatch_configs.
Activity
  • The pull request was initiated by Duyi-Wang.
  • It introduces a new feature for MORI-EP to dynamically switch between inter-node kernel types for performance optimization.
  • A new environment variable has been introduced and documented to control this kernel switching behavior.
  • The underlying MORI library dependency has been updated.
  • The author has provided a clear motivation for the changes, highlighting the performance benefits of the new InterNodeV1LL kernel for specific token counts.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to switch between two MORI inter-node kernels, InterNodeV1 and InterNodeV1LL, based on the number of tokens per rank. This is a valuable optimization to leverage the new low-latency kernel for smaller workloads. The implementation correctly uses environment variables to control this behavior. My feedback includes suggestions to improve the clarity of the documentation and code comments for the new configuration options, and to fix a minor typo for better code quality.

Duyi-Wang and others added 3 commits February 8, 2026 12:44
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

amd documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant