Skip to content

single node AG/RS enable warp speed#3134

Open
isaki001 wants to merge 3 commits intodevelopfrom
users/isaki001/single_node_ag_rs_enable_warpSpeed
Open

single node AG/RS enable warp speed#3134
isaki001 wants to merge 3 commits intodevelopfrom
users/isaki001/single_node_ag_rs_enable_warpSpeed

Conversation

@isaki001
Copy link
Contributor

@isaki001 isaki001 commented Feb 6, 2026

Motivation

Enable warpSpeed for AllGather and AllReduce on gfx950 to take advantage of reducing CUs by half.

Technical Details

  • Switch to using single-slice, single chunk-step, and unroll=2, for allReduce and AllGather on gfx950
  • For single-node gfx950 AllGather:
  • warpSpeed is enabled for message sizes >= 128MB
  • CU reduction applies even when warpSpeed is not enabled
  • 1-3% regression observed at 32MB-64MB when compared against unroll=1, CU=112, warpSpeed Disabled AllGather
  • For single-node gfx950 ReduceScatter:
  • warpSpeed is enabled for message sizes >= 2GB
  • CU reduction applies even when warpSpeed is not enabled
  • 18-24% regression observed at 1MB-4MB when compared against unroll=1, CU=112, warpSpeed Disabled AllGather
  • Protocol adjustment from LL to Simple limits this regression to 13-18% for 2MB-4MB

JIRA ID

AICOMRCCL-415
AICOMRCCL-416

Test Plan

gfx950 correctness and performance tests

Test Result

Submission Checklist

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant