Skip to content

[Issue]: Small shapes all-reduce performance regression between version 28 and 29 on H100 machine #2094

@PatriosTheGreat

Description

@PatriosTheGreat

How is this issue impacting you?

Lower performance than expected

Share Your Debug Logs

Logs from NCCL 29 launch, I can also attach logs from NCCL 28 if needed

40983aab902f:53830:53830 [0] NCCL INFO ENV/Plugin: Could not find: libnccl-env.so
40983aab902f:53830:53830 [0] NCCL INFO Bootstrap: Using eth0:192.168.9.2<0>
40983aab902f:53830:53830 [0] NCCL INFO cudaDriverVersion 12040
40983aab902f:53830:53830 [0] NCCL INFO NCCL version 2.29.7+cuda12.3
40983aab902f:53830:53830 [0] NCCL INFO NCCL git version unknown unknown
40983aab902f:53830:53846 [5] NCCL INFO NET/Plugin: Could not find: libnccl-net.so
40983aab902f:53830:53846 [5] NCCL INFO Failed to open libibverbs.so[.1]
40983aab902f:53830:53846 [5] NCCL INFO transport/net_ib/init.cc:396 -> 3
40983aab902f:53830:53846 [5] NCCL INFO Failed to initialize NET plugin IB
40983aab902f:53830:53846 [5] NCCL INFO NET/Socket : Using [0]eth0:192.168.9.2<0>
40983aab902f:53830:53846 [5] NCCL INFO Initialized NET plugin Socket
40983aab902f:53830:53846 [5] NCCL INFO Assigned NET plugin Socket to comm
40983aab902f:53830:53841 [0] NCCL INFO Initialized NET plugin Socket
40983aab902f:53830:53846 [5] NCCL INFO GIN/Plugin: Could not find: libnccl-gin.so

[2026-04-07 09:29:42] 40983aab902f:53830:53846 [5] misc/ibvwrap.cc:173 NCCL WARN lib wrapper not initialized.
40983aab902f:53830:53846 [5] NCCL INFO transport/net_ib/gdr.cc:56 -> 3
40983aab902f:53830:53846 [5] NCCL INFO Failed to initialize any GIN plugin
40983aab902f:53830:53846 [5] NCCL INFO Using network Socket
40983aab902f:53830:53841 [0] NCCL INFO Assigned NET plugin Socket to comm
40983aab902f:53830:53841 [0] NCCL INFO Failed to initialize any GIN plugin
40983aab902f:53830:53841 [0] NCCL INFO Using network Socket
40983aab902f:53830:53847 [6] NCCL INFO Initialized NET plugin Socket
40983aab902f:53830:53847 [6] NCCL INFO Assigned NET plugin Socket to comm
40983aab902f:53830:53847 [6] NCCL INFO Failed to initialize any GIN plugin
40983aab902f:53830:53847 [6] NCCL INFO Using network Socket
40983aab902f:53830:53844 [3] NCCL INFO Initialized NET plugin Socket
40983aab902f:53830:53844 [3] NCCL INFO Assigned NET plugin Socket to comm
40983aab902f:53830:53844 [3] NCCL INFO Failed to initialize any GIN plugin
40983aab902f:53830:53844 [3] NCCL INFO Using network Socket
40983aab902f:53830:53842 [1] NCCL INFO Initialized NET plugin Socket
40983aab902f:53830:53842 [1] NCCL INFO Assigned NET plugin Socket to comm
40983aab902f:53830:53842 [1] NCCL INFO Failed to initialize any GIN plugin
40983aab902f:53830:53842 [1] NCCL INFO Using network Socket
40983aab902f:53830:53848 [7] NCCL INFO Initialized NET plugin Socket
40983aab902f:53830:53848 [7] NCCL INFO Assigned NET plugin Socket to comm
40983aab902f:53830:53848 [7] NCCL INFO Failed to initialize any GIN plugin
40983aab902f:53830:53848 [7] NCCL INFO Using network Socket
40983aab902f:53830:53843 [2] NCCL INFO Initialized NET plugin Socket
40983aab902f:53830:53843 [2] NCCL INFO Assigned NET plugin Socket to comm
40983aab902f:53830:53843 [2] NCCL INFO Failed to initialize any GIN plugin
40983aab902f:53830:53843 [2] NCCL INFO Using network Socket
40983aab902f:53830:53845 [4] NCCL INFO Initialized NET plugin Socket
40983aab902f:53830:53845 [4] NCCL INFO Assigned NET plugin Socket to comm
40983aab902f:53830:53845 [4] NCCL INFO Failed to initialize any GIN plugin
40983aab902f:53830:53845 [4] NCCL INFO Using network Socket
40983aab902f:53830:53846 [5] NCCL INFO [Rank 5] ncclCommInitRankConfig comm 0x563cb8b747a0 rank 5 nranks 8 cudaDev 5 nvmlDev 5 busId 85000 commId 0xfa356df9cba89de0 - Init START
40983aab902f:53830:53841 [0] NCCL INFO [Rank 0] ncclCommInitRankConfig comm 0x563cb8577020 rank 0 nranks 8 cudaDev 0 nvmlDev 0 busId 4000 commId 0xfa356df9cba89de0 - Init START
40983aab902f:53830:53847 [6] NCCL INFO [Rank 6] ncclCommInitRankConfig comm 0x563cb8ca7160 rank 6 nranks 8 cudaDev 6 nvmlDev 6 busId 8a000 commId 0xfa356df9cba89de0 - Init START
40983aab902f:53830:53844 [3] NCCL INFO [Rank 3] ncclCommInitRankConfig comm 0x563cb890f420 rank 3 nranks 8 cudaDev 3 nvmlDev 3 busId b000 commId 0xfa356df9cba89de0 - Init START
40983aab902f:53830:53842 [1] NCCL INFO [Rank 1] ncclCommInitRankConfig comm 0x563cb86a9c40 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 5000 commId 0xfa356df9cba89de0 - Init START
40983aab902f:53830:53848 [7] NCCL INFO [Rank 7] ncclCommInitRankConfig comm 0x563cb8dd9b20 rank 7 nranks 8 cudaDev 7 nvmlDev 7 busId 8b000 commId 0xfa356df9cba89de0 - Init START
40983aab902f:53830:53843 [2] NCCL INFO [Rank 2] ncclCommInitRankConfig comm 0x563cb87dc830 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId a000 commId 0xfa356df9cba89de0 - Init START
40983aab902f:53830:53845 [4] NCCL INFO [Rank 4] ncclCommInitRankConfig comm 0x563cb8a41de0 rank 4 nranks 8 cudaDev 4 nvmlDev 4 busId 84000 commId 0xfa356df9cba89de0 - Init START
40983aab902f:53830:53841 [0] NCCL INFO RAS client listening socket at ::1<28028>
40983aab902f:53830:53841 [0] NCCL INFO Bootstrap timings total 0.077588 (create 0.000046, send 0.000092, recv 0.000297, ring 0.076332, delay 0.000000)
40983aab902f:53830:53848 [7] NCCL INFO Bootstrap timings total 0.077247 (create 0.000044, send 0.000120, recv 0.000069, ring 0.076303, delay 0.000000)
40983aab902f:53830:53844 [3] NCCL INFO Bootstrap timings total 0.077463 (create 0.000038, send 0.000109, recv 0.000885, ring 0.076213, delay 0.000000)
40983aab902f:53830:53842 [1] NCCL INFO Bootstrap timings total 0.077617 (create 0.000036, send 0.000085, recv 0.000723, ring 0.076264, delay 0.000000)
40983aab902f:53830:53847 [6] NCCL INFO Bootstrap timings total 0.077877 (create 0.000050, send 0.000125, recv 0.000384, ring 0.076336, delay 0.000000)
40983aab902f:53830:53846 [5] NCCL INFO Bootstrap timings total 0.078028 (create 0.000058, send 0.000146, recv 0.000252, ring 0.000220, delay 0.000001)
40983aab902f:53830:53845 [4] NCCL INFO Bootstrap timings total 0.076721 (create 0.000049, send 0.000098, recv 0.000310, ring 0.000205, delay 0.000000)
40983aab902f:53830:53843 [2] NCCL INFO Bootstrap timings total 0.077003 (create 0.000050, send 0.000110, recv 0.000395, ring 0.076281, delay 0.000000)
40983aab902f:53830:53848 [7] NCCL INFO MNNVL busId 0x8b000 fabric UUID 0.0 cliqueId 0x0 state 3 healthMask 0x0
40983aab902f:53830:53841 [0] NCCL INFO MNNVL busId 0x4000 fabric UUID 0.0 cliqueId 0x0 state 3 healthMask 0x0
40983aab902f:53830:53847 [6] NCCL INFO MNNVL busId 0x8a000 fabric UUID 0.0 cliqueId 0x0 state 3 healthMask 0x0
40983aab902f:53830:53844 [3] NCCL INFO MNNVL busId 0xb000 fabric UUID 0.0 cliqueId 0x0 state 3 healthMask 0x0
40983aab902f:53830:53843 [2] NCCL INFO MNNVL busId 0xa000 fabric UUID 0.0 cliqueId 0x0 state 3 healthMask 0x0
40983aab902f:53830:53845 [4] NCCL INFO MNNVL busId 0x84000 fabric UUID 0.0 cliqueId 0x0 state 3 healthMask 0x0
40983aab902f:53830:53846 [5] NCCL INFO MNNVL busId 0x85000 fabric UUID 0.0 cliqueId 0x0 state 3 healthMask 0x0
40983aab902f:53830:53842 [1] NCCL INFO MNNVL busId 0x5000 fabric UUID 0.0 cliqueId 0x0 state 3 healthMask 0x0
40983aab902f:53830:53841 [0] NCCL INFO NCCL_TOPO_DUMP_FILE set by environment to ncclSystem.txt
40983aab902f:53830:53844 [3] NCCL INFO ncclTopoGetCpuAffinity: Affinity for GPU 3 is 0-51,104-155. (GPU affinity = 0-51,104-155 ; CPU affinity = 0-207).
40983aab902f:53830:53844 [3] NCCL INFO NVLS multicast support is available on dev 3 (NVLS_NCHANNELS 16)
40983aab902f:53830:53841 [0] NCCL INFO ncclTopoGetCpuAffinity: Affinity for GPU 0 is 0-51,104-155. (GPU affinity = 0-51,104-155 ; CPU affinity = 0-207).
40983aab902f:53830:53842 [1] NCCL INFO ncclTopoGetCpuAffinity: Affinity for GPU 1 is 0-51,104-155. (GPU affinity = 0-51,104-155 ; CPU affinity = 0-207).
40983aab902f:53830:53841 [0] NCCL INFO NVLS multicast support is available on dev 0 (NVLS_NCHANNELS 16)
40983aab902f:53830:53847 [6] NCCL INFO ncclTopoGetCpuAffinity: Affinity for GPU 6 is 52-103,156-207. (GPU affinity = 52-103,156-207 ; CPU affinity = 0-207).
40983aab902f:53830:53845 [4] NCCL INFO ncclTopoGetCpuAffinity: Affinity for GPU 4 is 52-103,156-207. (GPU affinity = 52-103,156-207 ; CPU affinity = 0-207).
40983aab902f:53830:53847 [6] NCCL INFO NVLS multicast support is available on dev 6 (NVLS_NCHANNELS 16)
40983aab902f:53830:53848 [7] NCCL INFO ncclTopoGetCpuAffinity: Affinity for GPU 7 is 52-103,156-207. (GPU affinity = 52-103,156-207 ; CPU affinity = 0-207).
40983aab902f:53830:53848 [7] NCCL INFO NVLS multicast support is available on dev 7 (NVLS_NCHANNELS 16)
40983aab902f:53830:53842 [1] NCCL INFO NVLS multicast support is available on dev 1 (NVLS_NCHANNELS 16)
40983aab902f:53830:53843 [2] NCCL INFO ncclTopoGetCpuAffinity: Affinity for GPU 2 is 0-51,104-155. (GPU affinity = 0-51,104-155 ; CPU affinity = 0-207).
40983aab902f:53830:53843 [2] NCCL INFO NVLS multicast support is available on dev 2 (NVLS_NCHANNELS 16)
40983aab902f:53830:53845 [4] NCCL INFO NVLS multicast support is available on dev 4 (NVLS_NCHANNELS 16)
40983aab902f:53830:53846 [5] NCCL INFO ncclTopoGetCpuAffinity: Affinity for GPU 5 is 52-103,156-207. (GPU affinity = 52-103,156-207 ; CPU affinity = 0-207).
40983aab902f:53830:53846 [5] NCCL INFO NVLS multicast support is available on dev 5 (NVLS_NCHANNELS 16)
40983aab902f:53830:53847 [6] NCCL INFO comm 0x563cb8ca7160 rank 6 nRanks 8 nNodes 1 localRanks 8 localRank 6 MNNVL 0
40983aab902f:53830:53847 [6] NCCL INFO Trees [0] 7/-1/-1->6->5 [1] 7/-1/-1->6->5 [2] 7/-1/-1->6->5 [3] 7/-1/-1->6->5 [4] 7/-1/-1->6->5 [5] 7/-1/-1->6->5 [6] 7/-1/-1->6->5 [7] 7/-1/-1->6->5 [8] 7/-1/-1->6->5 [9] 7/-1/-1->6->5 [10] 7/-1/-1->6->5 [11] 7/-1/-1->6->5 [12] 7/-1/-1->6->5 [13] 7/-1/-1->6->5 [14] 7/-1/-1->6->5 [15] 7/-1/-1->6->5 [16] 7/-1/-1->6->5 [17] 7/-1/-1->6->5 [18] 7/-1/-1->6->5 [19] 7/-1/-1->6->5 [20] 7/-1/-1->6->5 [21] 7/-1/-1->6->5 [22] 7/-1/-1->6->5 [23] 7/-1/-1->6->5
40983aab902f:53830:53847 [6] NCCL INFO P2P Chunksize set to 524288
40983aab902f:53830:53847 [6] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so
40983aab902f:53830:53847 [6] NCCL INFO Check P2P Type isAllDirectP2p 1 directMode 1 isAllCudaP2p 1
40983aab902f:53830:53846 [5] NCCL INFO comm 0x563cb8b747a0 rank 5 nRanks 8 nNodes 1 localRanks 8 localRank 5 MNNVL 0
40983aab902f:53830:53845 [4] NCCL INFO comm 0x563cb8a41de0 rank 4 nRanks 8 nNodes 1 localRanks 8 localRank 4 MNNVL 0
40983aab902f:53830:53846 [5] NCCL INFO Trees [0] 6/-1/-1->5->4 [1] 6/-1/-1->5->4 [2] 6/-1/-1->5->4 [3] 6/-1/-1->5->4 [4] 6/-1/-1->5->4 [5] 6/-1/-1->5->4 [6] 6/-1/-1->5->4 [7] 6/-1/-1->5->4 [8] 6/-1/-1->5->4 [9] 6/-1/-1->5->4 [10] 6/-1/-1->5->4 [11] 6/-1/-1->5->4 [12] 6/-1/-1->5->4 [13] 6/-1/-1->5->4 [14] 6/-1/-1->5->4 [15] 6/-1/-1->5->4 [16] 6/-1/-1->5->4 [17] 6/-1/-1->5->4 [18] 6/-1/-1->5->4 [19] 6/-1/-1->5->4 [20] 6/-1/-1->5->4 [21] 6/-1/-1->5->4 [22] 6/-1/-1->5->4 [23] 6/-1/-1->5->4
40983aab902f:53830:53845 [4] NCCL INFO Trees [0] 5/-1/-1->4->3 [1] 5/-1/-1->4->3 [2] 5/-1/-1->4->3 [3] 5/-1/-1->4->3 [4] 5/-1/-1->4->3 [5] 5/-1/-1->4->3 [6] 5/-1/-1->4->3 [7] 5/-1/-1->4->3 [8] 5/-1/-1->4->3 [9] 5/-1/-1->4->3 [10] 5/-1/-1->4->3 [11] 5/-1/-1->4->3 [12] 5/-1/-1->4->3 [13] 5/-1/-1->4->3 [14] 5/-1/-1->4->3 [15] 5/-1/-1->4->3 [16] 5/-1/-1->4->3 [17] 5/-1/-1->4->3 [18] 5/-1/-1->4->3 [19] 5/-1/-1->4->3 [20] 5/-1/-1->4->3 [21] 5/-1/-1->4->3 [22] 5/-1/-1->4->3 [23] 5/-1/-1->4->3
40983aab902f:53830:53851 [0] NCCL INFO [Proxy Service UDS] Device 6 CPU core 67
40983aab902f:53830:53845 [4] NCCL INFO P2P Chunksize set to 524288
40983aab902f:53830:53846 [5] NCCL INFO P2P Chunksize set to 524288
40983aab902f:53830:53850 [0] NCCL INFO [Proxy Service] Device 6 CPU core 61
40983aab902f:53830:53848 [7] NCCL INFO comm 0x563cb8dd9b20 rank 7 nRanks 8 nNodes 1 localRanks 8 localRank 7 MNNVL 0
40983aab902f:53830:53848 [7] NCCL INFO Trees [0] -1/-1/-1->7->6 [1] -1/-1/-1->7->6 [2] -1/-1/-1->7->6 [3] -1/-1/-1->7->6 [4] -1/-1/-1->7->6 [5] -1/-1/-1->7->6 [6] -1/-1/-1->7->6 [7] -1/-1/-1->7->6 [8] -1/-1/-1->7->6 [9] -1/-1/-1->7->6 [10] -1/-1/-1->7->6 [11] -1/-1/-1->7->6 [12] -1/-1/-1->7->6 [13] -1/-1/-1->7->6 [14] -1/-1/-1->7->6 [15] -1/-1/-1->7->6 [16] -1/-1/-1->7->6 [17] -1/-1/-1->7->6 [18] -1/-1/-1->7->6 [19] -1/-1/-1->7->6 [20] -1/-1/-1->7->6 [21] -1/-1/-1->7->6 [22] -1/-1/-1->7->6 [23] -1/-1/-1->7->6
40983aab902f:53830:53848 [7] NCCL INFO P2P Chunksize set to 524288
40983aab902f:53830:53841 [0] NCCL INFO comm 0x563cb8577020 rank 0 nRanks 8 nNodes 1 localRanks 8 localRank 0 MNNVL 0
40983aab902f:53830:53841 [0] NCCL INFO Channel 00/24 : 0 1 2 3 4 5 6 7
40983aab902f:53830:53841 [0] NCCL INFO Channel 01/24 : 0 1 2 3 4 5 6 7
40983aab902f:53830:53841 [0] NCCL INFO Channel 02/24 : 0 1 2 3 4 5 6 7
40983aab902f:53830:53841 [0] NCCL INFO Channel 03/24 : 0 1 2 3 4 5 6 7
40983aab902f:53830:53841 [0] NCCL INFO Channel 04/24 : 0 1 2 3 4 5 6 7
40983aab902f:53830:53841 [0] NCCL INFO Channel 05/24 : 0 1 2 3 4 5 6 7
40983aab902f:53830:53841 [0] NCCL INFO Channel 06/24 : 0 1 2 3 4 5 6 7
40983aab902f:53830:53841 [0] NCCL INFO Channel 07/24 : 0 1 2 3 4 5 6 7
40983aab902f:53830:53841 [0] NCCL INFO Channel 08/24 : 0 1 2 3 4 5 6 7
40983aab902f:53830:53841 [0] NCCL INFO Channel 09/24 : 0 1 2 3 4 5 6 7
40983aab902f:53830:53841 [0] NCCL INFO Channel 10/24 : 0 1 2 3 4 5 6 7
40983aab902f:53830:53841 [0] NCCL INFO Channel 11/24 : 0 1 2 3 4 5 6 7
40983aab902f:53830:53841 [0] NCCL INFO Channel 12/24 : 0 1 2 3 4 5 6 7
40983aab902f:53830:53841 [0] NCCL INFO Channel 13/24 : 0 1 2 3 4 5 6 7
40983aab902f:53830:53841 [0] NCCL INFO Channel 14/24 : 0 1 2 3 4 5 6 7
40983aab902f:53830:53841 [0] NCCL INFO Channel 15/24 : 0 1 2 3 4 5 6 7
40983aab902f:53830:53841 [0] NCCL INFO Channel 16/24 : 0 1 2 3 4 5 6 7
40983aab902f:53830:53841 [0] NCCL INFO Channel 17/24 : 0 1 2 3 4 5 6 7
40983aab902f:53830:53841 [0] NCCL INFO Channel 18/24 : 0 1 2 3 4 5 6 7
40983aab902f:53830:53841 [0] NCCL INFO Channel 19/24 : 0 1 2 3 4 5 6 7
40983aab902f:53830:53841 [0] NCCL INFO Channel 20/24 : 0 1 2 3 4 5 6 7
40983aab902f:53830:53841 [0] NCCL INFO Channel 21/24 : 0 1 2 3 4 5 6 7
40983aab902f:53830:53841 [0] NCCL INFO Channel 22/24 : 0 1 2 3 4 5 6 7
40983aab902f:53830:53841 [0] NCCL INFO Channel 23/24 : 0 1 2 3 4 5 6 7
40983aab902f:53830:53841 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 [4] 1/-1/-1->0->-1 [5] 1/-1/-1->0->-1 [6] 1/-1/-1->0->-1 [7] 1/-1/-1->0->-1 [8] 1/-1/-1->0->-1 [9] 1/-1/-1->0->-1 [10] 1/-1/-1->0->-1 [11] 1/-1/-1->0->-1 [12] 1/-1/-1->0->-1 [13] 1/-1/-1->0->-1 [14] 1/-1/-1->0->-1 [15] 1/-1/-1->0->-1 [16] 1/-1/-1->0->-1 [17] 1/-1/-1->0->-1 [18] 1/-1/-1->0->-1 [19] 1/-1/-1->0->-1 [20] 1/-1/-1->0->-1 [21] 1/-1/-1->0->-1 [22] 1/-1/-1->0->-1 [23] 1/-1/-1->0->-1
40983aab902f:53830:53841 [0] NCCL INFO P2P Chunksize set to 524288
40983aab902f:53830:53843 [2] NCCL INFO comm 0x563cb87dc830 rank 2 nRanks 8 nNodes 1 localRanks 8 localRank 2 MNNVL 0
40983aab902f:53830:53843 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1 [2] 3/-1/-1->2->1 [3] 3/-1/-1->2->1 [4] 3/-1/-1->2->1 [5] 3/-1/-1->2->1 [6] 3/-1/-1->2->1 [7] 3/-1/-1->2->1 [8] 3/-1/-1->2->1 [9] 3/-1/-1->2->1 [10] 3/-1/-1->2->1 [11] 3/-1/-1->2->1 [12] 3/-1/-1->2->1 [13] 3/-1/-1->2->1 [14] 3/-1/-1->2->1 [15] 3/-1/-1->2->1 [16] 3/-1/-1->2->1 [17] 3/-1/-1->2->1 [18] 3/-1/-1->2->1 [19] 3/-1/-1->2->1 [20] 3/-1/-1->2->1 [21] 3/-1/-1->2->1 [22] 3/-1/-1->2->1 [23] 3/-1/-1->2->1
40983aab902f:53830:53843 [2] NCCL INFO P2P Chunksize set to 524288
40983aab902f:53830:53844 [3] NCCL INFO comm 0x563cb890f420 rank 3 nRanks 8 nNodes 1 localRanks 8 localRank 3 MNNVL 0
40983aab902f:53830:53842 [1] NCCL INFO comm 0x563cb86a9c40 rank 1 nRanks 8 nNodes 1 localRanks 8 localRank 1 MNNVL 0
40983aab902f:53830:53844 [3] NCCL INFO Trees [0] 4/-1/-1->3->2 [1] 4/-1/-1->3->2 [2] 4/-1/-1->3->2 [3] 4/-1/-1->3->2 [4] 4/-1/-1->3->2 [5] 4/-1/-1->3->2 [6] 4/-1/-1->3->2 [7] 4/-1/-1->3->2 [8] 4/-1/-1->3->2 [9] 4/-1/-1->3->2 [10] 4/-1/-1->3->2 [11] 4/-1/-1->3->2 [12] 4/-1/-1->3->2 [13] 4/-1/-1->3->2 [14] 4/-1/-1->3->2 [15] 4/-1/-1->3->2 [16] 4/-1/-1->3->2 [17] 4/-1/-1->3->2 [18] 4/-1/-1->3->2 [19] 4/-1/-1->3->2 [20] 4/-1/-1->3->2 [21] 4/-1/-1->3->2 [22] 4/-1/-1->3->2 [23] 4/-1/-1->3->2
40983aab902f:53830:53844 [3] NCCL INFO P2P Chunksize set to 524288
40983aab902f:53830:53842 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0 [2] 2/-1/-1->1->0 [3] 2/-1/-1->1->0 [4] 2/-1/-1->1->0 [5] 2/-1/-1->1->0 [6] 2/-1/-1->1->0 [7] 2/-1/-1->1->0 [8] 2/-1/-1->1->0 [9] 2/-1/-1->1->0 [10] 2/-1/-1->1->0 [11] 2/-1/-1->1->0 [12] 2/-1/-1->1->0 [13] 2/-1/-1->1->0 [14] 2/-1/-1->1->0 [15] 2/-1/-1->1->0 [16] 2/-1/-1->1->0 [17] 2/-1/-1->1->0 [18] 2/-1/-1->1->0 [19] 2/-1/-1->1->0 [20] 2/-1/-1->1->0 [21] 2/-1/-1->1->0 [22] 2/-1/-1->1->0 [23] 2/-1/-1->1->0
40983aab902f:53830:53842 [1] NCCL INFO P2P Chunksize set to 524288
40983aab902f:53830:53846 [5] NCCL INFO Check P2P Type isAllDirectP2p 1 directMode 1 isAllCudaP2p 1
40983aab902f:53830:53852 [0] NCCL INFO [Proxy Service] Device 5 CPU core 176
40983aab902f:53830:53853 [0] NCCL INFO [Proxy Service UDS] Device 5 CPU core 75
40983aab902f:53830:53845 [4] NCCL INFO Check P2P Type isAllDirectP2p 1 directMode 1 isAllCudaP2p 1
40983aab902f:53830:53855 [0] NCCL INFO [Proxy Service UDS] Device 4 CPU core 181
40983aab902f:53830:53854 [0] NCCL INFO [Proxy Service] Device 4 CPU core 170
40983aab902f:53830:53848 [7] NCCL INFO Check P2P Type isAllDirectP2p 1 directMode 1 isAllCudaP2p 1
40983aab902f:53830:53856 [0] NCCL INFO [Proxy Service] Device 7 CPU core 204
40983aab902f:53830:53857 [0] NCCL INFO [Proxy Service UDS] Device 7 CPU core 187
40983aab902f:53830:53843 [2] NCCL INFO Check P2P Type isAllDirectP2p 1 directMode 1 isAllCudaP2p 1
40983aab902f:53830:53858 [0] NCCL INFO [Proxy Service] Device 2 CPU core 143
40983aab902f:53830:53859 [0] NCCL INFO [Proxy Service UDS] Device 2 CPU core 146
40983aab902f:53830:53841 [0] NCCL INFO Check P2P Type isAllDirectP2p 1 directMode 1 isAllCudaP2p 1
40983aab902f:53830:53860 [0] NCCL INFO [Proxy Service] Device 0 CPU core 8
40983aab902f:53830:53861 [0] NCCL INFO [Proxy Service UDS] Device 0 CPU core 17
40983aab902f:53830:53842 [1] NCCL INFO Check P2P Type isAllDirectP2p 1 directMode 1 isAllCudaP2p 1
40983aab902f:53830:53862 [0] NCCL INFO [Proxy Service] Device 1 CPU core 18
40983aab902f:53830:53863 [0] NCCL INFO [Proxy Service UDS] Device 1 CPU core 19
40983aab902f:53830:53844 [3] NCCL INFO Check P2P Type isAllDirectP2p 1 directMode 1 isAllCudaP2p 1
40983aab902f:53830:53864 [0] NCCL INFO [Proxy Service] Device 3 CPU core 2
40983aab902f:53830:53865 [0] NCCL INFO [Proxy Service UDS] Device 3 CPU core 125
40983aab902f:53830:53842 [1] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so
40983aab902f:53830:53842 [1] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512
40983aab902f:53830:53842 [1] NCCL INFO 24 coll channels, 24 collnet channels, 16 nvls channels, 32 p2p channels, 32 p2p channels per peer
40983aab902f:53830:53842 [1] NCCL INFO Symmetric VA size=80GB
40983aab902f:53830:53841 [0] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512
40983aab902f:53830:53841 [0] NCCL INFO 24 coll channels, 24 collnet channels, 16 nvls channels, 32 p2p channels, 32 p2p channels per peer
40983aab902f:53830:53841 [0] NCCL INFO Symmetric VA size=80GB
40983aab902f:53830:53848 [7] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512
40983aab902f:53830:53848 [7] NCCL INFO 24 coll channels, 24 collnet channels, 16 nvls channels, 32 p2p channels, 32 p2p channels per peer
40983aab902f:53830:53848 [7] NCCL INFO Symmetric VA size=80GB
40983aab902f:53830:53845 [4] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512
40983aab902f:53830:53845 [4] NCCL INFO 24 coll channels, 24 collnet channels, 16 nvls channels, 32 p2p channels, 32 p2p channels per peer
40983aab902f:53830:53845 [4] NCCL INFO Symmetric VA size=80GB
40983aab902f:53830:53847 [6] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512
40983aab902f:53830:53847 [6] NCCL INFO 24 coll channels, 24 collnet channels, 16 nvls channels, 32 p2p channels, 32 p2p channels per peer
40983aab902f:53830:53847 [6] NCCL INFO Symmetric VA size=80GB
40983aab902f:53830:53844 [3] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512
40983aab902f:53830:53844 [3] NCCL INFO 24 coll channels, 24 collnet channels, 16 nvls channels, 32 p2p channels, 32 p2p channels per peer
40983aab902f:53830:53844 [3] NCCL INFO Symmetric VA size=80GB
40983aab902f:53830:53846 [5] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512
40983aab902f:53830:53846 [5] NCCL INFO 24 coll channels, 24 collnet channels, 16 nvls channels, 32 p2p channels, 32 p2p channels per peer
40983aab902f:53830:53846 [5] NCCL INFO Symmetric VA size=80GB
40983aab902f:53830:53841 [0] NCCL INFO CC Off, workFifoBytes 1048576
40983aab902f:53830:53843 [2] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512
40983aab902f:53830:53843 [2] NCCL INFO 24 coll channels, 24 collnet channels, 16 nvls channels, 32 p2p channels, 32 p2p channels per peer
40983aab902f:53830:53843 [2] NCCL INFO Symmetric VA size=80GB
40983aab902f:53830:53842 [1] NCCL INFO ncclCommInitRankConfig comm 0x563cb86a9c40 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 5000 commId 0xfa356df9cba89de0 - Init COMPLETE
40983aab902f:53830:53842 [1] NCCL INFO Init timings - ncclCommInitRankConfig: rank 1 nranks 8 total 3.03 (kernels 1.22, alloc 0.24, bootstrap 0.08, allgathers 0.03, topo 1.10, graphs 0.05, connections 0.27, rest 0.03)
40983aab902f:53830:53846 [5] NCCL INFO ncclCommInitRankConfig comm 0x563cb8b747a0 rank 5 nranks 8 cudaDev 5 nvmlDev 5 busId 85000 commId 0xfa356df9cba89de0 - Init COMPLETE
40983aab902f:53830:53846 [5] NCCL INFO Init timings - ncclCommInitRankConfig: rank 5 nranks 8 total 3.03 (kernels 1.22, alloc 0.24, bootstrap 0.08, allgathers 0.06, topo 1.11, graphs 0.02, connections 0.29, rest 0.01)
40983aab902f:53830:53841 [0] NCCL INFO ncclCommInitRankConfig comm 0x563cb8577020 rank 0 nranks 8 cudaDev 0 nvmlDev 0 busId 4000 commId 0xfa356df9cba89de0 - Init COMPLETE
40983aab902f:53830:53841 [0] NCCL INFO Init timings - ncclCommInitRankConfig: rank 0 nranks 8 total 3.03 (kernels 1.22, alloc 0.24, bootstrap 0.08, allgathers 0.03, topo 1.10, graphs 0.05, connections 0.28, rest 0.02)
40983aab902f:53830:53844 [3] NCCL INFO ncclCommInitRankConfig comm 0x563cb890f420 rank 3 nranks 8 cudaDev 3 nvmlDev 3 busId b000 commId 0xfa356df9cba89de0 - Init COMPLETE
40983aab902f:53830:53844 [3] NCCL INFO Init timings - ncclCommInitRankConfig: rank 3 nranks 8 total 3.03 (kernels 1.22, alloc 0.24, bootstrap 0.08, allgathers 0.06, topo 1.10, graphs 0.03, connections 0.27, rest 0.03)
40983aab902f:53830:53845 [4] NCCL INFO ncclCommInitRankConfig comm 0x563cb8a41de0 rank 4 nranks 8 cudaDev 4 nvmlDev 4 busId 84000 commId 0xfa356df9cba89de0 - Init COMPLETE
40983aab902f:53830:53845 [4] NCCL INFO Init timings - ncclCommInitRankConfig: rank 4 nranks 8 total 3.03 (kernels 1.22, alloc 0.24, bootstrap 0.08, allgathers 0.05, topo 1.11, graphs 0.03, connections 0.29, rest 0.02)
40983aab902f:53830:53843 [2] NCCL INFO ncclCommInitRankConfig comm 0x563cb87dc830 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId a000 commId 0xfa356df9cba89de0 - Init COMPLETE
40983aab902f:53830:53843 [2] NCCL INFO Init timings - ncclCommInitRankConfig: rank 2 nranks 8 total 3.03 (kernels 1.22, alloc 0.24, bootstrap 0.08, allgathers 0.07, topo 1.11, graphs 0.01, connections 0.28, rest 0.02)
40983aab902f:53830:53848 [7] NCCL INFO ncclCommInitRankConfig comm 0x563cb8dd9b20 rank 7 nranks 8 cudaDev 7 nvmlDev 7 busId 8b000 commId 0xfa356df9cba89de0 - Init COMPLETE
40983aab902f:53830:53848 [7] NCCL INFO Init timings - ncclCommInitRankConfig: rank 7 nranks 8 total 3.03 (kernels 1.22, alloc 0.24, bootstrap 0.08, allgathers 0.07, topo 1.11, graphs 0.02, connections 0.28, rest 0.02)
40983aab902f:53830:53847 [6] NCCL INFO ncclCommInitRankConfig comm 0x563cb8ca7160 rank 6 nranks 8 cudaDev 6 nvmlDev 6 busId 8a000 commId 0xfa356df9cba89de0 - Init COMPLETE
40983aab902f:53830:53847 [6] NCCL INFO Init timings - ncclCommInitRankConfig: rank 6 nranks 8 total 3.03 (kernels 1.22, alloc 0.24, bootstrap 0.08, allgathers 0.05, topo 1.11, graphs 0.03, connections 0.31, rest 0.01)
40983aab902f:53830:53830 [7] NCCL INFO Symmetric VA size=80GB
40983aab902f:53830:53830 [6] NCCL INFO Symmetric VA size=80GB
40983aab902f:53830:53830 [5] NCCL INFO Symmetric VA size=80GB
40983aab902f:53830:53830 [4] NCCL INFO Symmetric VA size=80GB
40983aab902f:53830:53830 [3] NCCL INFO Symmetric VA size=80GB
40983aab902f:53830:53830 [2] NCCL INFO Symmetric VA size=80GB
40983aab902f:53830:53830 [1] NCCL INFO Symmetric VA size=80GB
40983aab902f:53830:53830 [0] NCCL INFO Symmetric VA size=80GB
40983aab902f:53830:53868 [5] NCCL INFO Channel 00/0 : 5[5] -> 6[6] via P2P/direct pointer
40983aab902f:53830:53867 [6] NCCL INFO Channel 00/0 : 6[6] -> 7[7] via P2P/direct pointer
40983aab902f:53830:53868 [5] NCCL INFO Channel 01/0 : 5[5] -> 6[6] via P2P/direct pointer
40983aab902f:53830:53868 [5] NCCL INFO Channel 02/0 : 5[5] -> 6[6] via P2P/direct pointer
40983aab902f:53830:53868 [5] NCCL INFO Channel 03/0 : 5[5] -> 6[6] via P2P/direct pointer
40983aab902f:53830:53867 [6] NCCL INFO Channel 01/0 : 6[6] -> 7[7] via P2P/direct pointer
40983aab902f:53830:53868 [5] NCCL INFO Channel 04/0 : 5[5] -> 6[6] via P2P/direct pointer
40983aab902f:53830:53867 [6] NCCL INFO Channel 02/0 : 6[6] -> 7[7] via P2P/direct pointer
40983aab902f:53830:53871 [2] NCCL INFO Channel 00/0 : 2[2] -> 3[3] via P2P/direct pointer
40983aab902f:53830:53867 [6] NCCL INFO Channel 03/0 : 6[6] -> 7[7] via P2P/direct pointer
40983aab902f:53830:53868 [5] NCCL INFO Channel 05/0 : 5[5] -> 6[6] via P2P/direct pointer
40983aab902f:53830:53871 [2] NCCL INFO Channel 01/0 : 2[2] -> 3[3] via P2P/direct pointer
40983aab902f:53830:53872 [1] NCCL INFO Channel 00/0 : 1[1] -> 2[2] via P2P/direct pointer
40983aab902f:53830:53868 [5] NCCL INFO Channel 06/0 : 5[5] -> 6[6] via P2P/direct pointer
40983aab902f:53830:53871 [2] NCCL INFO Channel 02/0 : 2[2] -> 3[3] via P2P/direct pointer
40983aab902f:53830:53869 [4] NCCL INFO Channel 00/0 : 4[4] -> 5[5] via P2P/direct pointer
40983aab902f:53830:53866 [7] NCCL INFO Channel 00/0 : 7[7] -> 0[0] via P2P/direct pointer
40983aab902f:53830:53867 [6] NCCL INFO Channel 04/0 : 6[6] -> 7[7] via P2P/direct pointer
40983aab902f:53830:53868 [5] NCCL INFO Channel 07/0 : 5[5] -> 6[6] via P2P/direct pointer
40983aab902f:53830:53872 [1] NCCL INFO Channel 01/0 : 1[1] -> 2[2] via P2P/direct pointer
40983aab902f:53830:53869 [4] NCCL INFO Channel 01/0 : 4[4] -> 5[5] via P2P/direct pointer
40983aab902f:53830:53867 [6] NCCL INFO Channel 05/0 : 6[6] -> 7[7] via P2P/direct pointer
40983aab902f:53830:53871 [2] NCCL INFO Channel 03/0 : 2[2] -> 3[3] via P2P/direct pointer
40983aab902f:53830:53866 [7] NCCL INFO Channel 01/0 : 7[7] -> 0[0] via P2P/direct pointer
40983aab902f:53830:53868 [5] NCCL INFO Channel 08/0 : 5[5] -> 6[6] via P2P/direct pointer
40983aab902f:53830:53869 [4] NCCL INFO Channel 02/0 : 4[4] -> 5[5] via P2P/direct pointer
40983aab902f:53830:53872 [1] NCCL INFO Channel 02/0 : 1[1] -> 2[2] via P2P/direct pointer
40983aab902f:53830:53867 [6] NCCL INFO Channel 06/0 : 6[6] -> 7[7] via P2P/direct pointer
40983aab902f:53830:53871 [2] NCCL INFO Channel 04/0 : 2[2] -> 3[3] via P2P/direct pointer
40983aab902f:53830:53872 [1] NCCL INFO Channel 03/0 : 1[1] -> 2[2] via P2P/direct pointer
40983aab902f:53830:53871 [2] NCCL INFO Channel 05/0 : 2[2] -> 3[3] via P2P/direct pointer
40983aab902f:53830:53866 [7] NCCL INFO Channel 02/0 : 7[7] -> 0[0] via P2P/direct pointer
40983aab902f:53830:53869 [4] NCCL INFO Channel 03/0 : 4[4] -> 5[5] via P2P/direct pointer
40983aab902f:53830:53867 [6] NCCL INFO Channel 07/0 : 6[6] -> 7[7] via P2P/direct pointer
40983aab902f:53830:53870 [3] NCCL INFO Channel 00/0 : 3[3] -> 4[4] via P2P/direct pointer
40983aab902f:53830:53872 [1] NCCL INFO Channel 04/0 : 1[1] -> 2[2] via P2P/direct pointer
40983aab902f:53830:53868 [5] NCCL INFO Channel 09/0 : 5[5] -> 6[6] via P2P/direct pointer
40983aab902f:53830:53873 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[1] via P2P/direct pointer
40983aab902f:53830:53871 [2] NCCL INFO Channel 06/0 : 2[2] -> 3[3] via P2P/direct pointer
40983aab902f:53830:53867 [6] NCCL INFO Channel 08/0 : 6[6] -> 7[7] via P2P/direct pointer
40983aab902f:53830:53866 [7] NCCL INFO Channel 03/0 : 7[7] -> 0[0] via P2P/direct pointer
40983aab902f:53830:53868 [5] NCCL INFO Channel 10/0 : 5[5] -> 6[6] via P2P/direct pointer
40983aab902f:53830:53869 [4] NCCL INFO Channel 04/0 : 4[4] -> 5[5] via P2P/direct pointer
40983aab902f:53830:53866 [7] NCCL INFO Channel 04/0 : 7[7] -> 0[0] via P2P/direct pointer
40983aab902f:53830:53870 [3] NCCL INFO Channel 01/0 : 3[3] -> 4[4] via P2P/direct pointer
40983aab902f:53830:53872 [1] NCCL INFO Channel 05/0 : 1[1] -> 2[2] via P2P/direct pointer
40983aab902f:53830:53869 [4] NCCL INFO Channel 05/0 : 4[4] -> 5[5] via P2P/direct pointer
40983aab902f:53830:53867 [6] NCCL INFO Channel 09/0 : 6[6] -> 7[7] via P2P/direct pointer
40983aab902f:53830:53871 [2] NCCL INFO Channel 07/0 : 2[2] -> 3[3] via P2P/direct pointer
40983aab902f:53830:53873 [0] NCCL INFO Channel 01/0 : 0[0] -> 1[1] via P2P/direct pointer
40983aab902f:53830:53868 [5] NCCL INFO Channel 11/0 : 5[5] -> 6[6] via P2P/direct pointer
40983aab902f:53830:53872 [1] NCCL INFO Channel 06/0 : 1[1] -> 2[2] via P2P/direct pointer
40983aab902f:53830:53867 [6] NCCL INFO Channel 10/0 : 6[6] -> 7[7] via P2P/direct pointer
40983aab902f:53830:53868 [5] NCCL INFO Channel 12/0 : 5[5] -> 6[6] via P2P/direct pointer
40983aab902f:53830:53869 [4] NCCL INFO Channel 06/0 : 4[4] -> 5[5] via P2P/direct pointer
40983aab902f:53830:53873 [0] NCCL INFO Channel 02/0 : 0[0] -> 1[1] via P2P/direct pointer
40983aab902f:53830:53866 [7] NCCL INFO Channel 05/0 : 7[7] -> 0[0] via P2P/direct pointer
40983aab902f:53830:53871 [2] NCCL INFO Channel 08/0 : 2[2] -> 3[3] via P2P/direct pointer
40983aab902f:53830:53872 [1] NCCL INFO Channel 07/0 : 1[1] -> 2[2] via P2P/direct pointer
40983aab902f:53830:53870 [3] NCCL INFO Channel 02/0 : 3[3] -> 4[4] via P2P/direct pointer
40983aab902f:53830:53867 [6] NCCL INFO Channel 11/0 : 6[6] -> 7[7] via P2P/direct pointer
40983aab902f:53830:53868 [5] NCCL INFO Channel 13/0 : 5[5] -> 6[6] via P2P/direct pointer
40983aab902f:53830:53869 [4] NCCL INFO Channel 07/0 : 4[4] -> 5[5] via P2P/direct pointer
40983aab902f:53830:53866 [7] NCCL INFO Channel 06/0 : 7[7] -> 0[0] via P2P/direct pointer
40983aab902f:53830:53871 [2] NCCL INFO Channel 09/0 : 2[2] -> 3[3] via P2P/direct pointer
40983aab902f:53830:53870 [3] NCCL INFO Channel 03/0 : 3[3] -> 4[4] via P2P/direct pointer
40983aab902f:53830:53867 [6] NCCL INFO Channel 12/0 : 6[6] -> 7[7] via P2P/direct pointer
40983aab902f:53830:53869 [4] NCCL INFO Channel 08/0 : 4[4] -> 5[5] via P2P/direct pointer
40983aab902f:53830:53872 [1] NCCL INFO Channel 08/0 : 1[1] -> 2[2] via P2P/direct pointer
40983aab902f:53830:53868 [5] NCCL INFO Channel 14/0 : 5[5] -> 6[6] via P2P/direct pointer
40983aab902f:53830:53871 [2] NCCL INFO Channel 10/0 : 2[2] -> 3[3] via P2P/direct pointer
40983aab902f:53830:53873 [0] NCCL INFO Channel 03/0 : 0[0] -> 1[1] via P2P/direct pointer
40983aab902f:53830:53867 [6] NCCL INFO Channel 13/0 : 6[6] -> 7[7] via P2P/direct pointer
40983aab902f:53830:53872 [1] NCCL INFO Channel 09/0 : 1[1] -> 2[2] via P2P/direct pointer
40983aab902f:53830:53869 [4] NCCL INFO Channel 09/0 : 4[4] -> 5[5] via P2P/direct pointer
40983aab902f:53830:53873 [0] NCCL INFO Channel 04/0 : 0[0] -> 1[1] via P2P/direct pointer
40983aab902f:53830:53870 [3] NCCL INFO Channel 04/0 : 3[3] -> 4[4] via P2P/direct pointer
40983aab902f:53830:53866 [7] NCCL INFO Channel 07/0 : 7[7] -> 0[0] via P2P/direct pointer
40983aab902f:53830:53871 [2] NCCL INFO Channel 11/0 : 2[2] -> 3[3] via P2P/direct pointer
40983aab902f:53830:53868 [5] NCCL INFO Channel 15/0 : 5[5] -> 6[6] via P2P/direct pointer
40983aab902f:53830:53866 [7] NCCL INFO Channel 08/0 : 7[7] -> 0[0] via P2P/direct pointer
40983aab902f:53830:53869 [4] NCCL INFO Channel 10/0 : 4[4] -> 5[5] via P2P/direct pointer
40983aab902f:53830:53870 [3] NCCL INFO Channel 05/0 : 3[3] -> 4[4] via P2P/direct pointer
40983aab902f:53830:53873 [0] NCCL INFO Channel 05/0 : 0[0] -> 1[1] via P2P/direct pointer
40983aab902f:53830:53871 [2] NCCL INFO Channel 12/0 : 2[2] -> 3[3] via P2P/direct pointer
40983aab902f:53830:53867 [6] NCCL INFO Channel 14/0 : 6[6] -> 7[7] via P2P/direct pointer
40983aab902f:53830:53866 [7] NCCL INFO Channel 09/0 : 7[7] -> 0[0] via P2P/direct pointer
40983aab902f:53830:53869 [4] NCCL INFO Channel 11/0 : 4[4] -> 5[5] via P2P/direct pointer
40983aab902f:53830:53872 [1] NCCL INFO Channel 10/0 : 1[1] -> 2[2] via P2P/direct pointer
40983aab902f:53830:53868 [5] NCCL INFO Channel 16/0 : 5[5] -> 6[6] via P2P/direct pointer
40983aab902f:53830:53867 [6] NCCL INFO Channel 15/0 : 6[6] -> 7[7] via P2P/direct pointer
40983aab902f:53830:53870 [3] NCCL INFO Channel 06/0 : 3[3] -> 4[4] via P2P/direct pointer
40983aab902f:53830:53869 [4] NCCL INFO Channel 12/0 : 4[4] -> 5[5] via P2P/direct pointer
40983aab902f:53830:53873 [0] NCCL INFO Channel 06/0 : 0[0] -> 1[1] via P2P/direct pointer
40983aab902f:53830:53866 [7] NCCL INFO Channel 10/0 : 7[7] -> 0[0] via P2P/direct pointer
40983aab902f:53830:53867 [6] NCCL INFO Channel 16/0 : 6[6] -> 7[7] via P2P/direct pointer
40983aab902f:53830:53871 [2] NCCL INFO Channel 13/0 : 2[2] -> 3[3] via P2P/direct pointer
40983aab902f:53830:53869 [4] NCCL INFO Channel 13/0 : 4[4] -> 5[5] via P2P/direct pointer
40983aab902f:53830:53873 [0] NCCL INFO Channel 07/0 : 0[0] -> 1[1] via P2P/direct pointer
40983aab902f:53830:53872 [1] NCCL INFO Channel 11/0 : 1[1] -> 2[2] via P2P/direct pointer
40983aab902f:53830:53868 [5] NCCL INFO Channel 17/0 : 5[5] -> 6[6] via P2P/direct pointer
40983aab902f:53830:53866 [7] NCCL INFO Channel 11/0 : 7[7] -> 0[0] via P2P/direct pointer
40983aab902f:53830:53870 [3] NCCL INFO Channel 07/0 : 3[3] -> 4[4] via P2P/direct pointer
40983aab902f:53830:53871 [2] NCCL INFO Channel 14/0 : 2[2] -> 3[3] via P2P/direct pointer
40983aab902f:53830:53866 [7] NCCL INFO Channel 12/0 : 7[7] -> 0[0] via P2P/direct pointer
40983aab902f:53830:53867 [6] NCCL INFO Channel 17/0 : 6[6] -> 7[7] via P2P/direct pointer
40983aab902f:53830:53868 [5] NCCL INFO Channel 18/0 : 5[5] -> 6[6] via P2P/direct pointer
40983aab902f:53830:53869 [4] NCCL INFO Channel 14/0 : 4[4] -> 5[5] via P2P/direct pointer
40983aab902f:53830:53870 [3] NCCL INFO Channel 08/0 : 3[3] -> 4[4] via P2P/direct pointer
40983aab902f:53830:53871 [2] NCCL INFO Channel 15/0 : 2[2] -> 3[3] via P2P/direct pointer
40983aab902f:53830:53873 [0] NCCL INFO Channel 08/0 : 0[0] -> 1[1] via P2P/direct pointer
40983aab902f:53830:53866 [7] NCCL INFO Channel 13/0 : 7[7] -> 0[0] via P2P/direct pointer
40983aab902f:53830:53868 [5] NCCL INFO Channel 19/0 : 5[5] -> 6[6] via P2P/direct pointer
40983aab902f:53830:53872 [1] NCCL INFO Channel 12/0 : 1[1] -> 2[2] via P2P/direct pointer
40983aab902f:53830:53873 [0] NCCL INFO Channel 09/0 : 0[0] -> 1[1] via P2P/direct pointer
40983aab902f:53830:53870 [3] NCCL INFO Channel 09/0 : 3[3] -> 4[4] via P2P/direct pointer
40983aab902f:53830:53866 [7] NCCL INFO Channel 14/0 : 7[7] -> 0[0] via P2P/direct pointer
40983aab902f:53830:53867 [6] NCCL INFO Channel 18/0 : 6[6] -> 7[7] via P2P/direct pointer
40983aab902f:53830:53869 [4] NCCL INFO Channel 15/0 : 4[4] -> 5[5] via P2P/direct pointer
40983aab902f:53830:53868 [5] NCCL INFO Channel 20/0 : 5[5] -> 6[6] via P2P/direct pointer
40983aab902f:53830:53870 [3] NCCL INFO Channel 10/0 : 3[3] -> 4[4] via P2P/direct pointer
40983aab902f:53830:53871 [2] NCCL INFO Channel 16/0 : 2[2] -> 3[3] via P2P/direct pointer
40983aab902f:53830:53866 [7] NCCL INFO Channel 15/0 : 7[7] -> 0[0] via P2P/direct pointer
40983aab902f:53830:53872 [1] NCCL INFO Channel 13/0 : 1[1] -> 2[2] via P2P/direct pointer
40983aab902f:53830:53869 [4] NCCL INFO Channel 16/0 : 4[4] -> 5[5] via P2P/direct pointer
40983aab902f:53830:53870 [3] NCCL INFO Channel 11/0 : 3[3] -> 4[4] via P2P/direct pointer
40983aab902f:53830:53871 [2] NCCL INFO Channel 17/0 : 2[2] -> 3[3] via P2P/direct pointer
40983aab902f:53830:53873 [0] NCCL INFO Channel 10/0 : 0[0] -> 1[1] via P2P/direct pointer
40983aab902f:53830:53872 [1] NCCL INFO Channel 14/0 : 1[1] -> 2[2] via P2P/direct pointer
40983aab902f:53830:53868 [5] NCCL INFO Channel 21/0 : 5[5] -> 6[6] via P2P/direct pointer
40983aab902f:53830:53870 [3] NCCL INFO Channel 12/0 : 3[3] -> 4[4] via P2P/direct pointer
40983aab902f:53830:53866 [7] NCCL INFO Channel 16/0 : 7[7] -> 0[0] via P2P/direct pointer
40983aab902f:53830:53867 [6] NCCL INFO Channel 19/0 : 6[6] -> 7[7] via P2P/direct pointer
40983aab902f:53830:53869 [4] NCCL INFO Channel 17/0 : 4[4] -> 5[5] via P2P/direct pointer
40983aab902f:53830:53868 [5] NCCL INFO Channel 22/0 : 5[5] -> 6[6] via P2P/direct pointer
40983aab902f:53830:53871 [2] NCCL INFO Channel 18/0 : 2[2] -> 3[3] via P2P/direct pointer
40983aab902f:53830:53867 [6] NCCL INFO Channel 20/0 : 6[6] -> 7[7] via P2P/direct pointer
40983aab902f:53830:53873 [0] NCCL INFO Channel 11/0 : 0[0] -> 1[1] via P2P/direct pointer
40983aab902f:53830:53869 [4] NCCL INFO Channel 18/0 : 4[4] -> 5[5] via P2P/direct pointer
40983aab902f:53830:53868 [5] NCCL INFO Channel 23/0 : 5[5] -> 6[6] via P2P/direct pointer
40983aab902f:53830:53870 [3] NCCL INFO Channel 13/0 : 3[3] -> 4[4] via P2P/direct pointer
40983aab902f:53830:53867 [6] NCCL INFO Channel 21/0 : 6[6] -> 7[7] via P2P/direct pointer
40983aab902f:53830:53866 [7] NCCL INFO Channel 17/0 : 7[7] -> 0[0] via P2P/direct pointer
40983aab902f:53830:53872 [1] NCCL INFO Channel 15/0 : 1[1] -> 2[2] via P2P/direct pointer
40983aab902f:53830:53871 [2] NCCL INFO Channel 19/0 : 2[2] -> 3[3] via P2P/direct pointer
40983aab902f:53830:53869 [4] NCCL INFO Channel 19/0 : 4[4] -> 5[5] via P2P/direct pointer
40983aab902f:53830:53873 [0] NCCL INFO Channel 12/0 : 0[0] -> 1[1] via P2P/direct pointer
40983aab902f:53830:53870 [3] NCCL INFO Channel 14/0 : 3[3] -> 4[4] via P2P/direct pointer
40983aab902f:53830:53867 [6] NCCL INFO Channel 22/0 : 6[6] -> 7[7] via P2P/direct pointer
40983aab902f:53830:53871 [2] NCCL INFO Channel 20/0 : 2[2] -> 3[3] via P2P/direct pointer
40983aab902f:53830:53872 [1] NCCL INFO Channel 16/0 : 1[1] -> 2[2] via P2P/direct pointer
40983aab902f:53830:53869 [4] NCCL INFO Channel 20/0 : 4[4] -> 5[5] via P2P/direct pointer
40983aab902f:53830:53873 [0] NCCL INFO Channel 13/0 : 0[0] -> 1[1] via P2P/direct pointer
40983aab902f:53830:53866 [7] NCCL INFO Channel 18/0 : 7[7] -> 0[0] via P2P/direct pointer
40983aab902f:53830:53867 [6] NCCL INFO Channel 23/0 : 6[6] -> 7[7] via P2P/direct pointer
40983aab902f:53830:53870 [3] NCCL INFO Channel 15/0 : 3[3] -> 4[4] via P2P/direct pointer
40983aab902f:53830:53870 [3] NCCL INFO Channel 16/0 : 3[3] -> 4[4] via P2P/direct pointer
40983aab902f:53830:53873 [0] NCCL INFO Channel 14/0 : 0[0] -> 1[1] via P2P/direct pointer
40983aab902f:53830:53871 [2] NCCL INFO Channel 21/0 : 2[2] -> 3[3] via P2P/direct pointer
40983aab902f:53830:53872 [1] NCCL INFO Channel 17/0 : 1[1] -> 2[2] via P2P/direct pointer
40983aab902f:53830:53869 [4] NCCL INFO Channel 21/0 : 4[4] -> 5[5] via P2P/direct pointer
40983aab902f:53830:53866 [7] NCCL INFO Channel 19/0 : 7[7] -> 0[0] via P2P/direct pointer
40983aab902f:53830:53872 [1] NCCL INFO Channel 18/0 : 1[1] -> 2[2] via P2P/direct pointer
40983aab902f:53830:53871 [2] NCCL INFO Channel 22/0 : 2[2] -> 3[3] via P2P/direct pointer
40983aab902f:53830:53870 [3] NCCL INFO Channel 17/0 : 3[3] -> 4[4] via P2P/direct pointer
40983aab902f:53830:53873 [0] NCCL INFO Channel 15/0 : 0[0] -> 1[1] via P2P/direct pointer
40983aab902f:53830:53866 [7] NCCL INFO Channel 20/0 : 7[7] -> 0[0] via P2P/direct pointer
40983aab902f:53830:53871 [2] NCCL INFO Channel 23/0 : 2[2] -> 3[3] via P2P/direct pointer
40983aab902f:53830:53872 [1] NCCL INFO Channel 19/0 : 1[1] -> 2[2] via P2P/direct pointer
40983aab902f:53830:53870 [3] NCCL INFO Channel 18/0 : 3[3] -> 4[4] via P2P/direct pointer
40983aab902f:53830:53866 [7] NCCL INFO Channel 21/0 : 7[7] -> 0[0] via P2P/direct pointer
40983aab902f:53830:53869 [4] NCCL INFO Channel 22/0 : 4[4] -> 5[5] via P2P/direct pointer
40983aab902f:53830:53866 [7] NCCL INFO Channel 22/0 : 7[7] -> 0[0] via P2P/direct pointer
40983aab902f:53830:53870 [3] NCCL INFO Channel 19/0 : 3[3] -> 4[4] via P2P/direct pointer
40983aab902f:53830:53872 [1] NCCL INFO Channel 20/0 : 1[1] -> 2[2] via P2P/direct pointer
40983aab902f:53830:53869 [4] NCCL INFO Channel 23/0 : 4[4] -> 5[5] via P2P/direct pointer
40983aab902f:53830:53873 [0] NCCL INFO Channel 16/0 : 0[0] -> 1[1] via P2P/direct pointer
40983aab902f:53830:53866 [7] NCCL INFO Channel 23/0 : 7[7] -> 0[0] via P2P/direct pointer
40983aab902f:53830:53872 [1] NCCL INFO Channel 21/0 : 1[1] -> 2[2] via P2P/direct pointer
40983aab902f:53830:53870 [3] NCCL INFO Channel 20/0 : 3[3] -> 4[4] via P2P/direct pointer
40983aab902f:53830:53873 [0] NCCL INFO Channel 17/0 : 0[0] -> 1[1] via P2P/direct pointer
40983aab902f:53830:53870 [3] NCCL INFO Channel 21/0 : 3[3] -> 4[4] via P2P/direct pointer
40983aab902f:53830:53872 [1] NCCL INFO Channel 22/0 : 1[1] -> 2[2] via P2P/direct pointer
40983aab902f:53830:53872 [1] NCCL INFO Channel 23/0 : 1[1] -> 2[2] via P2P/direct pointer
40983aab902f:53830:53873 [0] NCCL INFO Channel 18/0 : 0[0] -> 1[1] via P2P/direct pointer
40983aab902f:53830:53870 [3] NCCL INFO Channel 22/0 : 3[3] -> 4[4] via P2P/direct pointer
40983aab902f:53830:53870 [3] NCCL INFO Channel 23/0 : 3[3] -> 4[4] via P2P/direct pointer
40983aab902f:53830:53873 [0] NCCL INFO Channel 19/0 : 0[0] -> 1[1] via P2P/direct pointer
40983aab902f:53830:53873 [0] NCCL INFO Channel 20/0 : 0[0] -> 1[1] via P2P/direct pointer
40983aab902f:53830:53873 [0] NCCL INFO Channel 21/0 : 0[0] -> 1[1] via P2P/direct pointer
40983aab902f:53830:53873 [0] NCCL INFO Channel 22/0 : 0[0] -> 1[1] via P2P/direct pointer
40983aab902f:53830:53873 [0] NCCL INFO Channel 23/0 : 0[0] -> 1[1] via P2P/direct pointer
40983aab902f:53830:53872 [1] NCCL INFO Connected all rings, use ring PXN 0 GDR 1
40983aab902f:53830:53869 [4] NCCL INFO Connected all rings, use ring PXN 0 GDR 1
40983aab902f:53830:53871 [2] NCCL INFO Connected all rings, use ring PXN 0 GDR 1
40983aab902f:53830:53866 [7] NCCL INFO Connected all rings, use ring PXN 0 GDR 1
40983aab902f:53830:53867 [6] NCCL INFO Connected all rings, use ring PXN 0 GDR 1
40983aab902f:53830:53870 [3] NCCL INFO Connected all rings, use ring PXN 0 GDR 1
40983aab902f:53830:53868 [5] NCCL INFO Connected all rings, use ring PXN 0 GDR 1
40983aab902f:53830:53873 [0] NCCL INFO Connected all rings, use ring PXN 0 GDR 1
40983aab902f:53830:53830 [0] NCCL INFO comm 0x563cb8577020 rank 0 nranks 8 cudaDev 0 busId 4000 - Destroy COMPLETE
40983aab902f:53830:53830 [7] NCCL INFO comm 0x563cb8dd9b20 rank 7 nranks 8 cudaDev 7 busId 8b000 - Destroy COMPLETE
40983aab902f:53830:53830 [6] NCCL INFO comm 0x563cb8ca7160 rank 6 nranks 8 cudaDev 6 busId 8a000 - Destroy COMPLETE
40983aab902f:53830:53830 [5] NCCL INFO comm 0x563cb8b747a0 rank 5 nranks 8 cudaDev 5 busId 85000 - Destroy COMPLETE
40983aab902f:53830:53830 [4] NCCL INFO comm 0x563cb8a41de0 rank 4 nranks 8 cudaDev 4 busId 84000 - Destroy COMPLETE
40983aab902f:53830:53830 [3] NCCL INFO comm 0x563cb890f420 rank 3 nranks 8 cudaDev 3 busId b000 - Destroy COMPLETE
40983aab902f:53830:53830 [2] NCCL INFO comm 0x563cb87dc830 rank 2 nranks 8 cudaDev 2 busId a000 - Destroy COMPLETE
40983aab902f:53830:53830 [1] NCCL INFO comm 0x563cb86a9c40 rank 1 nranks 8 cudaDev 1 busId 5000 - Destroy COMPLETE
40983aab902f:53830:53830 [7] NCCL INFO ENV/Plugin: Closing env plugin ncclEnvDefault

Steps to Reproduce the Issue

  1. build nccl 28 and nccl 29 + nccl-tests (I'm using clang as a C compiler if it's important).
  2. Run ./build/all_reduce_perf -g 8 -b 1024 -e 63488 -f 2 -n 5000 -w 1000 against NCCL 28 and 29

Results:

NCCL 29:

#                                                              out-of-place                       in-place          
#       size         count      type   redop    root     time   algbw   busbw  #wrong     time   algbw   busbw  #wrong 
#        (B)    (elements)                               (us)  (GB/s)  (GB/s)             (us)  (GB/s)  (GB/s)         
        1024           256     float     sum      -1    37.90    0.03    0.05       0    36.62    0.03    0.05       0
        2048           512     float     sum      -1    37.08    0.06    0.10       0    36.50    0.06    0.10       0
        4096          1024     float     sum      -1    37.35    0.11    0.19       0    37.97    0.11    0.19       0
        8192          2048     float     sum      -1    38.38    0.21    0.37       0    37.66    0.22    0.38       0
       16384          4096     float     sum      -1    38.36    0.43    0.75       0    38.51    0.43    0.74       0
       32768          8192     float     sum      -1    38.25    0.86    1.50       0    37.68    0.87    1.52       0
# Out of bounds values : 0 OK
# Avg bus bandwidth    : 0.494946

NCCL 28:

#                                                              out-of-place                       in-place          
#       size         count      type   redop    root     time   algbw   busbw  #wrong     time   algbw   busbw  #wrong 
#        (B)    (elements)                               (us)  (GB/s)  (GB/s)             (us)  (GB/s)  (GB/s)         
        1024           256     float     sum      -1    34.57    0.03    0.05       0    33.18    0.03    0.05       0
        2048           512     float     sum      -1    33.40    0.06    0.11       0    32.92    0.06    0.11       0
        4096          1024     float     sum      -1    32.51    0.13    0.22       0    33.21    0.12    0.22       0
        8192          2048     float     sum      -1    32.89    0.25    0.44       0    33.51    0.24    0.43       0
       16384          4096     float     sum      -1    32.41    0.51    0.88       0    32.86    0.50    0.87       0
       32768          8192     float     sum      -1    33.56    0.98    1.71       0    33.46    0.98    1.71       0
# Out of bounds values : 0 OK
# Avg bus bandwidth    : 0.566818

Latency with NCCL 29 seems to be ~10 lower. Is this expected?

NCCL Version

29.7+cuda12.3

Your platform details

Simple 8xH100 machine.

Error Message & Behavior

Latency regressed by ~10%

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions