-
Notifications
You must be signed in to change notification settings - Fork 181
Open
Description
Problem: When I test copy_bw on A100 SXM,I noticed some issue that need some help on explaination:
- H2D is persistently faster than cuMemcpy in small data size as expected, while for large data size it fails to grow when reaches 20GB/s. What is the reason for this degradation?
- D2H is much slower than H2D and never reaches high bandwidth even when data size is large enough. I guess the reason for the first phenomenon is that CPU need another dispatch trip to let GPU starts D2H, while I have no idea for the second one.
Anyone could help with the above questions?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels
