Skip to content

[Issue]: Intel Granite Rapids (GNR) are misdetected with lower UPI InterCpuBw #2107

@avnf

Description

@avnf

How is this issue impacting you?

Lower performance than expected

Share Your Debug Logs

Hello

Current version of ncclTopoGetInterCpuBw function has no support of GNR family of Intel Xeon CPUs
https://github.com/NVIDIA/nccl/blob/v2.30.3-1/src/graph/topo.cc#L73

  if (cpu->cpu.arch == NCCL_TOPO_CPU_ARCH_X86 && cpu->cpu.vendor == NCCL_TOPO_CPU_VENDOR_INTEL) {
    *bw =
      cpu->cpu.model == NCCL_TOPO_CPU_MODEL_INTEL_ERP ? ERP_QPI_BW :
      cpu->cpu.model == NCCL_TOPO_CPU_MODEL_INTEL_SRP ? SRP_QPI_BW :
      cpu->cpu.model == NCCL_TOPO_CPU_MODEL_INTEL_SKL ? SKL_QPI_BW :
      BDW_QPI_BW;
  }

I think that familyId == 6 && modelId == 0xAD will detect GNR Xeon chips, and they have UPI speed of 24 GT/s per channel (with multiple UPI links between sockets)
https://www.intel.com/content/www/us/en/products/sku/242668/intel-xeon-6507p-processor-48m-cache-3-50-ghz/specifications.html

I think for NCCL graph this will be GNR_QPI_BW equal to 48.0

Some sources also mention modelId 0xAE as GRANITERAPIDS D, but they are probably single socket only.

Current version may allocate less channels for 2 NUMA GNR machines with multiple PCIe-only GPUs without NVlink. I had 'SYS[22.0]' in NCCL_DEBUG with current code, and 'SYS[48.0]' after fixing, and busbw of all_reduce_perf improved after the fix.

Steps to Reproduce the Issue

No response

NCCL Version

2.30.3

Your platform details

No response

Error Message & Behavior

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions