Skip to content

GDR D2H much slower than H2D and cudaMemcpy #315

@shenh10

Description

@shenh10

Problem: When I test copy_bw on A100 SXM,I noticed some issue that need some help on explaination:

  1. H2D is persistently faster than cuMemcpy in small data size as expected, while for large data size it fails to grow when reaches 20GB/s. What is the reason for this degradation?
  2. D2H is much slower than H2D and never reaches high bandwidth even when data size is large enough. I guess the reason for the first phenomenon is that CPU need another dispatch trip to let GPU starts D2H, while I have no idea for the second one.

Anyone could help with the above questions?

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions