[Question]: send in reduce-scatter VS directsend in all-reduce

### Question

I find a interesting problem that, I thought all-reduce and reduce-scatter has the same (k - 1) steps before.
However, reduce-scatter just use send and recv, with allreduce use directsend and directrecv.
https://github.com/NVIDIA/nccl/blob/master/src/device/all_reduce.h#L48
https://github.com/NVIDIA/nccl/blob/master/src/device/reduce_scatter.h#L42

I wonder what's the difference? Or, why cause the difference? Is it related to performance?
PS: gpt-5 told me that reduce-scatter has less steps, so it can save the time of pointer conversion. Is it right?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: send in reduce-scatter VS directsend in all-reduce #2086

Question

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question]: send in reduce-scatter VS directsend in all-reduce #2086

Description

Question

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions