Skip to content

Fixes and improvements to NCCL user buffer registration feature.#122

Merged
romerojosh merged 4 commits into
mainfrom
nccl_ubr_fix
Apr 30, 2026
Merged

Fixes and improvements to NCCL user buffer registration feature.#122
romerojosh merged 4 commits into
mainfrom
nccl_ubr_fix

Conversation

@romerojosh

Copy link
Copy Markdown
Collaborator

This PR fixes a long-standing issue with the NCCL user buffer registration feature (enabled via CUDECOMP_ENABLE_NCCL_UBR) and makes several improvements to registration handling/cleanup.

The main fix is that the calls to ncclCommRegister within cudecompMalloc have always incorrectly registered the wrong memory address. In particular, the registration calls were passed buffer, which in the context of cudecompMalloc is a pointer to thee allocation pointer (e.g., void **), but it should have been passed *buffer. Critically, this means CUDECOMP_ENABLE_NCCL_UBR has essentially been a no-op up to now. Users who previously tried this feature and did not see a performance impact should consider giving it another try after this patch lands.

Outside this main fix, this PR also adds some code to improve registration clean up within cudecompFree and cudecompFinalize.

Signed-off-by: Josh Romero <joshr@nvidia.com>
Signed-off-by: Josh Romero <joshr@nvidia.com>
Signed-off-by: Josh Romero <joshr@nvidia.com>
Signed-off-by: Josh Romero <joshr@nvidia.com>
@romerojosh

Copy link
Copy Markdown
Collaborator Author

/build

@github-actions

Copy link
Copy Markdown

🚀 Build workflow triggered! View run

@github-actions

Copy link
Copy Markdown

✅ Build workflow passed! View run

@romerojosh romerojosh merged commit 022d0b3 into main Apr 30, 2026
4 checks passed
@romerojosh romerojosh deleted the nccl_ubr_fix branch April 30, 2026 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant