[Issue]: Intel Granite Rapids (GNR) are misdetected with lower UPI InterCpuBw

### How is this issue impacting you?

Lower performance than expected

### Share Your Debug Logs

Hello

Current version of ncclTopoGetInterCpuBw function has no support of GNR family of Intel Xeon CPUs
https://github.com/NVIDIA/nccl/blob/v2.30.3-1/src/graph/topo.cc#L73

```
  if (cpu->cpu.arch == NCCL_TOPO_CPU_ARCH_X86 && cpu->cpu.vendor == NCCL_TOPO_CPU_VENDOR_INTEL) {
    *bw =
      cpu->cpu.model == NCCL_TOPO_CPU_MODEL_INTEL_ERP ? ERP_QPI_BW :
      cpu->cpu.model == NCCL_TOPO_CPU_MODEL_INTEL_SRP ? SRP_QPI_BW :
      cpu->cpu.model == NCCL_TOPO_CPU_MODEL_INTEL_SKL ? SKL_QPI_BW :
      BDW_QPI_BW;
  }
```

I think that `familyId == 6 && modelId == 0xAD` will detect GNR Xeon chips, and they have UPI speed of 24 GT/s per channel (with multiple UPI links between sockets)
https://www.intel.com/content/www/us/en/products/sku/242668/intel-xeon-6507p-processor-48m-cache-3-50-ghz/specifications.html

I think for NCCL graph this will be GNR_QPI_BW equal to 48.0

Some [sources](https://github.com/torvalds/linux/blob/v7.0/arch/x86/include/asm/intel-family.h#L126) also mention modelId 0xAE as GRANITERAPIDS D, but they are probably [single socket only](https://www.intel.com/content/www/us/en/ark/products/codename/228655/products-formerly-granite-rapidsd.html#@Server).

Current version may allocate less channels for 2 NUMA GNR machines with multiple PCIe-only GPUs without NVlink. I had 'SYS[22.0]' in NCCL_DEBUG with current code, and 'SYS[48.0]' after fixing, and busbw of all_reduce_perf improved after the fix.

### Steps to Reproduce the Issue

_No response_

### NCCL Version

2.30.3

### Your platform details

_No response_

### Error Message & Behavior

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Issue]: Intel Granite Rapids (GNR) are misdetected with lower UPI InterCpuBw #2107

How is this issue impacting you?

Share Your Debug Logs

Steps to Reproduce the Issue

NCCL Version

Your platform details

Error Message & Behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Issue]: Intel Granite Rapids (GNR) are misdetected with lower UPI InterCpuBw #2107

Description

How is this issue impacting you?

Share Your Debug Logs

Steps to Reproduce the Issue

NCCL Version

Your platform details

Error Message & Behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions