Skip to content

host_to_device_memcpy_sm low than device_to_host_memcpy_sm #24

@ywxc1997

Description

@ywxc1997

Hi!
When we tested host_to-vice_cemcpy_sm and Device_to-host_cemcpy_sm separately on the H100 cluster, we obtained two completely different values

Running host_to_device_memcpy_sm.
memcpy SM CPU(row) -> GPU(column) bandwidth (GB/s)
0 1 2 3 4 5 6 7
0 35.19 35.25 35.30 35.03 35.25 35.32 35.39 35.06

Running device_to_host_memcpy_sm.
memcpy SM CPU(row) <- GPU(column) bandwidth (GB/s)
0 1 2 3 4 5 6 7
0 52.77 52.77 52.77 52.78 52.76 52.77 52.78 52.77

Actually, they should be close values.
What could be causing this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions