Skip to content

Comments

Best practice: GPU & RDMA Joint Allocation#192

Merged
koordinator-bot[bot] merged 1 commit intokoordinator-sh:mainfrom
ferris-cx:rdma-end2end
Feb 18, 2025
Merged

Best practice: GPU & RDMA Joint Allocation#192
koordinator-bot[bot] merged 1 commit intokoordinator-sh:mainfrom
ferris-cx:rdma-end2end

Conversation

@ferris-cx
Copy link
Contributor

Ⅰ. Describe what this PR does
Since Gpus in AI scenarios require RDMA computing nics for high-speed NCCL communication, end-to-end support for rdma devices must be added, including device discovery, device registration, node resource update, scheduling, and allocation.
Ⅱ. Does this pull request fix one issue?
No
Ⅲ. Describe how to verify it
Ⅳ. Special notes for reviews
V. Checklist
I have written necessary docs and comments
I have added necessary unit tests and integration tests
All checks passed in make test

@ZiMengSheng ZiMengSheng force-pushed the rdma-end2end branch 3 times, most recently from d68042b to 7da5817 Compare December 19, 2024 12:52
@saintube saintube changed the title Best practice Best practice: GPU & RDMA Joint Allocation Dec 20, 2024
@ZiMengSheng ZiMengSheng force-pushed the rdma-end2end branch 4 times, most recently from 2cc1a92 to 8bcbb07 Compare February 18, 2025 09:24
Signed-off-by: iostream2008@163.com <iostream2008@163.com>
Signed-off-by: wangjianyu <wangjianyu.wjy@alibaba-inc.com>
Copy link
Member

@saintube saintube left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@saintube saintube added the lgtm label Feb 18, 2025
@songtao98
Copy link
Contributor

/lgtm

@koordinator-bot koordinator-bot bot merged commit 1632b6e into koordinator-sh:main Feb 18, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants