Best practice: GPU & RDMA Joint Allocation#192

Merged

koordinator-bot[bot] merged 1 commit intokoordinator-sh:mainfrom

ferris-cx:rdma-end2end

Feb 18, 2025

Contributor

ferris-cx commented Dec 17, 2024

Ⅰ. Describe what this PR does
Since Gpus in AI scenarios require RDMA computing nics for high-speed NCCL communication, end-to-end support for rdma devices must be added, including device discovery, device registration, node resource update, scheduling, and allocation.
Ⅱ. Does this pull request fix one issue?
No
Ⅲ. Describe how to verify it
Ⅳ. Special notes for reviews
V. Checklist
I have written necessary docs and comments
I have added necessary unit tests and integration tests
All checks passed in make test

ZiMengSheng force-pushed the rdma-end2end branch from f861327 to c18a8df Compare

December 17, 2024 09:13

ferris-cx force-pushed the rdma-end2end branch from c18a8df to 2b05712 Compare

December 17, 2024 09:24

ZiMengSheng reviewed

View reviewed changes

docs/best-practices/gpu-and-rdma-joint-allocation.md Outdated Show resolved Hide resolved

ZiMengSheng reviewed

View reviewed changes

docs/best-practices/gpu-and-rdma-joint-allocation.md Outdated Show resolved Hide resolved

ZiMengSheng reviewed

View reviewed changes

docs/best-practices/gpu-and-rdma-joint-allocation.md Outdated Show resolved Hide resolved

ferris-cx force-pushed the rdma-end2end branch from 2b05712 to 1f0c4f9 Compare

December 19, 2024 03:12

ZiMengSheng reviewed

View reviewed changes

docs/best-practices/gpu-and-rdma-joint-allocation.md Outdated Show resolved Hide resolved

ZiMengSheng force-pushed the rdma-end2end branch 3 times, most recently from d68042b to 7da5817 Compare

December 19, 2024 12:52

saintube reviewed

View reviewed changes

docs/best-practices/gpu-and-rdma-joint-allocation.md Outdated Show resolved Hide resolved

saintube changed the title ~~Best practice~~ Best practice: GPU & RDMA Joint Allocation

songtao98 reviewed

View reviewed changes

docs/best-practices/gpu-and-rdma-joint-allocation.md Show resolved Hide resolved

ZiMengSheng force-pushed the rdma-end2end branch 4 times, most recently from 2cc1a92 to 8bcbb07 Compare

February 18, 2025 09:24

saintube reviewed

View reviewed changes

docs/best-practices/gpu-and-rdma-joint-allocation.md Show resolved Hide resolved

docs/best-practices/gpu-and-rdma-joint-allocation.md Show resolved Hide resolved

ZiMengSheng force-pushed the rdma-end2end branch from 8bcbb07 to 8f453cb Compare

February 18, 2025 10:03


          best pratices on gpu-rdma new

82e1851

Signed-off-by: iostream2008@163.com <iostream2008@163.com>
Signed-off-by: wangjianyu <wangjianyu.wjy@alibaba-inc.com>

ZiMengSheng force-pushed the rdma-end2end branch from 8f453cb to 82e1851 Compare

February 18, 2025 10:15

saintube approved these changes

View reviewed changes

Member

saintube left a comment

/lgtm

saintube added the lgtm label

Contributor

songtao98 commented Feb 18, 2025

/lgtm

ZiMengSheng added the approved label

koordinator-bot bot merged commit 1632b6e into koordinator-sh:main

4 checks passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels