Skip to content

[nccl-ep] Remove redundant cudaStreamSynchronize in ncclEpCreateHandle#2082

Open
kwen2501 wants to merge 1 commit intoNVIDIA:masterfrom
kwen2501:rm-sync
Open

[nccl-ep] Remove redundant cudaStreamSynchronize in ncclEpCreateHandle#2082
kwen2501 wants to merge 1 commit intoNVIDIA:masterfrom
kwen2501:rm-sync

Conversation

@kwen2501
Copy link
Copy Markdown
Collaborator

@kwen2501 kwen2501 commented Apr 1, 2026

Description

The ncclAllGather and call_metadata_preprocessing are on the same CUDA stream, so intra-stream ordering already guarantees the allgather completes before the preprocessing kernel launches. Additionally, it seems nowhere else would use the allgather result global_routing_map.

Related Issues

Changes & Impact

Removed the cudaStreamSynchronize

Performance Impact

Should save an unnecessary GPU-CPU sync.

@kwen2501 kwen2501 requested review from artpol84 and sb17v April 1, 2026 00:19
…ndle`

The `ncclAllGather` and `call_metadata_preprocessing` are on the same CUDA
stream, so intra-stream ordering already guarantees the allgather
completes before the preprocessing kernel launches. Additionally,
it seems nowhere else would use the allgather result
`global_routing_map`.

Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: Ke Wen <kwen@nvidia.com>
@kwen2501
Copy link
Copy Markdown
Collaborator Author

kwen2501 commented Apr 2, 2026

/mirror

@jskrobola
Copy link
Copy Markdown
Collaborator

/mirror-test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants