Skip to content

[UR] Fix buffer allocation logic#21725

Merged
KornevNikita merged 1 commit intosyclfrom
test/fix-cuda-use-host-pointer-buff
Apr 23, 2026
Merged

[UR] Fix buffer allocation logic#21725
KornevNikita merged 1 commit intosyclfrom
test/fix-cuda-use-host-pointer-buff

Conversation

@kweronsx
Copy link
Copy Markdown
Contributor

@kweronsx kweronsx commented Apr 10, 2026

This PR fixes a long-standing issue (#9789) where UR_MEM_FLAG_USE_HOST_POINTER was silently ignored in the CUDA and HIP adapters, causing the host data to be copied to the device instead of being mapped.

What changed:

Removed the dead EnableUseHostPtr = false flag in both cuda/memory.cpp and hip/memory.cpp. This flag was hardcoded to false with a TODO comment, meaning the USE_HOST_PTR path was never actually taken — the buffer always fell back to copying.

Enabled the UseHostPtr allocation mode by properly entering that branch when UR_MEM_FLAG_USE_HOST_POINTER is set. The hipHostRegister / cuMemHostRegister call was moved from urMemBufferCreate into allocateMemObjOnDeviceIfNeeded (i.e. it is deferred to lazy device allocation), which is where it already lived.

Added HostPtrRegisteredByUR ownership flag to BufferMem in both headers. This boolean is set to true only when UR itself performed the registration. It ensures cuMemHostUnregister / hipHostUnregister is only called by the owner, preventing a double-unregister.

Handled already-registered pointers gracefully: CUDA_ERROR_HOST_MEMORY_ALREADY_REGISTERED / hipErrorHostMemoryAlreadyRegistered is now tolerated in allocateMemObjOnDeviceIfNeeded. This covers cases where the same host pointer is registered across multiple lazy device allocations (e.g. multi-device contexts).

Explicit copy constructors were added to BufferMem in both adapters to deliberately not copy HostPtrRegisteredByUR, making ownership semantics clear — only the original object is responsible for unregistering.

Conformance test update: CUDA and HIP are removed from the UseHostPointer known-failure list in urMemBufferCreate.cpp, reflecting that the feature now works correctly on those backends.

@kweronsx kweronsx force-pushed the test/fix-cuda-use-host-pointer-buff branch 3 times, most recently from 8a2d0e3 to 6d27f2a Compare April 14, 2026 14:05
@kweronsx kweronsx marked this pull request as ready for review April 15, 2026 09:12
@kweronsx kweronsx requested review from a team as code owners April 15, 2026 09:12
@kweronsx kweronsx requested a review from kekaczma April 15, 2026 09:12
@kweronsx
Copy link
Copy Markdown
Contributor Author

This PR started from an attempt to get the UseHostPointer conformance test in urMemBufferCreate to pass for CUDA and HIP.

UR_MEM_FLAG_USE_HOST_POINTER had been silently treated as a copy operation in both adapters, hidden behind a hardcoded EnableUseHostPtr = false guard. The guard was accompanied by a comment citing a "weird segfault after program ends," but no such segfault was observed, so the dead code has been removed.

Enabling the feature revealed one real issue: cuMemHostRegister/hipHostRegister is called once during buffer creation in urMemBufferCreate, and then again in allocateMemObjOnDeviceIfNeeded at kernel submission time, resulting in a double-registration. This is handled by tolerating CUDA_ERROR_HOST_MEMORY_ALREADY_REGISTERED/hipErrorHostMemoryAlreadyRegistered in allocateMemObjOnDeviceIfNeeded instead of propagating it as an error.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates Unified Runtime buffer creation behavior to better support UR_MEM_FLAG_USE_HOST_POINTER by using host-memory registration (CUDA/HIP) rather than falling back to an initial copy, and adjusts conformance expectations accordingly.

Changes:

  • Enable UR_MEM_FLAG_USE_HOST_POINTER paths for HIP and CUDA adapters (removing the previous “disabled” gating logic).
  • Refine “initial copy” logic so it only triggers for UR_MEM_FLAG_ALLOC_COPY_HOST_POINTER.
  • Update conformance test expectations by removing HIP/CUDA from the known-failing UseHostPointer test list.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
unified-runtime/test/conformance/memory/urMemBufferCreate.cpp Removes HIP/CUDA from known failures for UseHostPointer conformance test.
unified-runtime/source/adapters/hip/memory.cpp Switches USE_HOST_POINTER to UseHostPtr alloc mode and adjusts registration behavior/error handling.
unified-runtime/source/adapters/cuda/memory.cpp Switches USE_HOST_POINTER to UseHostPtr alloc mode and adjusts registration error handling.
Comments suppressed due to low confidence (1)

unified-runtime/source/adapters/hip/memory.cpp:107

  • UR_MEM_FLAG_USE_HOST_POINTER now sets AllocMode::UseHostPtr, but urMemBufferCreate no longer registers the host memory. Since BufferMem::clear() unconditionally calls hipHostUnregister(HostPtr) for UseHostPtr, releasing a buffer that was never used on a device will attempt to unregister an unregistered pointer and can fail/crash. Register the host pointer in urMemBufferCreate when USE_HOST_POINTER is set, or track whether registration actually happened and only unregister in that case.
    auto HostPtr = pProperties ? pProperties->pHost : nullptr;
    BufferMem::AllocMode AllocMode = BufferMem::AllocMode::Classic;
    if (flags & UR_MEM_FLAG_USE_HOST_POINTER) {
      AllocMode = BufferMem::AllocMode::UseHostPtr;
    } else if (flags & UR_MEM_FLAG_ALLOC_HOST_POINTER) {
      UR_CHECK_ERROR(hipHostMalloc(&HostPtr, size));
      AllocMode = BufferMem::AllocMode::AllocHostPtr;
    } else if (flags & UR_MEM_FLAG_ALLOC_COPY_HOST_POINTER) {
      AllocMode = BufferMem::AllocMode::CopyIn;
    }

Comment thread unified-runtime/source/adapters/hip/memory.cpp
Comment thread unified-runtime/source/adapters/cuda/memory.cpp Outdated
Comment thread unified-runtime/source/adapters/cuda/memory.cpp
Comment thread unified-runtime/source/adapters/cuda/memory.cpp Outdated
@kweronsx kweronsx marked this pull request as draft April 15, 2026 12:25
@kweronsx kweronsx requested a review from ldorau April 15, 2026 13:39
@kweronsx kweronsx marked this pull request as ready for review April 15, 2026 13:39
@ldorau ldorau requested a review from Copilot April 15, 2026 13:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Comment thread unified-runtime/source/adapters/cuda/memory.cpp Outdated
Comment thread unified-runtime/source/adapters/cuda/memory.cpp
Comment thread unified-runtime/source/adapters/hip/memory.hpp
Comment thread unified-runtime/source/adapters/cuda/memory.cpp Outdated
@kweronsx kweronsx marked this pull request as draft April 16, 2026 06:06
@kweronsx kweronsx marked this pull request as ready for review April 16, 2026 07:12
@kweronsx kweronsx requested a review from ldorau April 16, 2026 07:12
Copy link
Copy Markdown
Contributor

@ldorau ldorau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two issues left:

Comment thread unified-runtime/source/adapters/cuda/memory.cpp Outdated
Comment thread unified-runtime/source/adapters/cuda/memory.hpp Outdated
@kweronsx kweronsx marked this pull request as draft April 16, 2026 08:18
@kweronsx kweronsx requested a review from ldorau April 16, 2026 09:18
@kweronsx kweronsx marked this pull request as ready for review April 16, 2026 09:19
Comment thread unified-runtime/source/adapters/hip/memory.hpp
@kweronsx kweronsx marked this pull request as draft April 16, 2026 09:34
@kweronsx kweronsx marked this pull request as ready for review April 16, 2026 10:12
@kweronsx kweronsx requested a review from ldorau April 16, 2026 10:12
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

Comment thread unified-runtime/source/adapters/hip/memory.hpp
@kweronsx kweronsx marked this pull request as draft April 16, 2026 10:29
@kweronsx kweronsx marked this pull request as ready for review April 16, 2026 10:35
@kweronsx kweronsx requested a review from ldorau April 16, 2026 10:35
@ldorau ldorau requested a review from Copilot April 16, 2026 10:38
Copy link
Copy Markdown
Contributor

@ldorau ldorau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

@ldorau
Copy link
Copy Markdown
Contributor

ldorau commented Apr 16, 2026

Review please @intel/llvm-reviewers-cuda

@ldorau
Copy link
Copy Markdown
Contributor

ldorau commented Apr 16, 2026

@kweronsx please update the description of this PR (it is empty now) #21725 (comment), because it will be used as the description of the final commit squashed from all commits from this PR.

@bratpiorka
Copy link
Copy Markdown
Contributor

LGTM but please squash commits

@kweronsx kweronsx force-pushed the test/fix-cuda-use-host-pointer-buff branch 2 times, most recently from 3879504 to 0be7c05 Compare April 21, 2026 10:13
…tration

This PR fixes issue #9789 (#9789)
where UR_MEM_FLAG_USE_HOST_POINTER was silently ignored in the CUDA and HIP
adapters, causing the host data to be copied to the device instead of being
mapped.

What changed:

Removed the dead EnableUseHostPtr = false flag in both cuda/memory.cpp and
hip/memory.cpp. This flag was hardcoded to false with a TODO comment, meaning
the USE_HOST_PTR path was never actually taken — the buffer always fell back to
copying.

Enabled the UseHostPtr allocation mode by properly entering that branch when
UR_MEM_FLAG_USE_HOST_POINTER is set. The hipHostRegister / cuMemHostRegister
call was moved from urMemBufferCreate into allocateMemObjOnDeviceIfNeeded
(i.e. it is deferred to lazy device allocation), which is where it already lived.

Added HostPtrRegisteredByUR ownership flag to BufferMem in both headers. This
boolean is set to true only when UR itself performed the registration. It
ensures cuMemHostUnregister / hipHostUnregister is only called by the owner,
preventing a double-unregister.

Handled already-registered pointers gracefully:
CUDA_ERROR_HOST_MEMORY_ALREADY_REGISTERED /
hipErrorHostMemoryAlreadyRegistered is now tolerated in
allocateMemObjOnDeviceIfNeeded. This covers cases where the same host
pointer is registered across multiple lazy device allocations
(e.g. multi-device contexts).

Conformance test update: CUDA and HIP are removed from the
UseHostPointer known-failure list in urMemBufferCreate.cpp, reflecting
that the feature now works correctly on those backends.
@bratpiorka
Copy link
Copy Markdown
Contributor

@kswiecicki ready to merge

@github-actions
Copy link
Copy Markdown
Contributor

@intel/llvm-gatekeepers please consider merging

@KornevNikita KornevNikita merged commit d5f040b into sycl Apr 23, 2026
60 of 62 checks passed
@kweronsx kweronsx deleted the test/fix-cuda-use-host-pointer-buff branch April 23, 2026 10:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants