[UR] Fix buffer allocation logic by kweronsx · Pull Request #21725 · intel/llvm

kweronsx · 2026-04-10T08:25:18Z

This PR fixes a long-standing issue (#9789) where UR_MEM_FLAG_USE_HOST_POINTER was silently ignored in the CUDA and HIP adapters, causing the host data to be copied to the device instead of being mapped.

What changed:

Removed the dead EnableUseHostPtr = false flag in both cuda/memory.cpp and hip/memory.cpp. This flag was hardcoded to false with a TODO comment, meaning the USE_HOST_PTR path was never actually taken — the buffer always fell back to copying.

Enabled the UseHostPtr allocation mode by properly entering that branch when UR_MEM_FLAG_USE_HOST_POINTER is set. The hipHostRegister / cuMemHostRegister call was moved from urMemBufferCreate into allocateMemObjOnDeviceIfNeeded (i.e. it is deferred to lazy device allocation), which is where it already lived.

Added HostPtrRegisteredByUR ownership flag to BufferMem in both headers. This boolean is set to true only when UR itself performed the registration. It ensures cuMemHostUnregister / hipHostUnregister is only called by the owner, preventing a double-unregister.

Handled already-registered pointers gracefully: CUDA_ERROR_HOST_MEMORY_ALREADY_REGISTERED / hipErrorHostMemoryAlreadyRegistered is now tolerated in allocateMemObjOnDeviceIfNeeded. This covers cases where the same host pointer is registered across multiple lazy device allocations (e.g. multi-device contexts).

Explicit copy constructors were added to BufferMem in both adapters to deliberately not copy HostPtrRegisteredByUR, making ownership semantics clear — only the original object is responsible for unregistering.

Conformance test update: CUDA and HIP are removed from the UseHostPointer known-failure list in urMemBufferCreate.cpp, reflecting that the feature now works correctly on those backends.

kweronsx · 2026-04-15T09:49:00Z

This PR started from an attempt to get the UseHostPointer conformance test in urMemBufferCreate to pass for CUDA and HIP.

UR_MEM_FLAG_USE_HOST_POINTER had been silently treated as a copy operation in both adapters, hidden behind a hardcoded EnableUseHostPtr = false guard. The guard was accompanied by a comment citing a "weird segfault after program ends," but no such segfault was observed, so the dead code has been removed.

Enabling the feature revealed one real issue: cuMemHostRegister/hipHostRegister is called once during buffer creation in urMemBufferCreate, and then again in allocateMemObjOnDeviceIfNeeded at kernel submission time, resulting in a double-registration. This is handled by tolerating CUDA_ERROR_HOST_MEMORY_ALREADY_REGISTERED/hipErrorHostMemoryAlreadyRegistered in allocateMemObjOnDeviceIfNeeded instead of propagating it as an error.

Copilot

Pull request overview

This PR updates Unified Runtime buffer creation behavior to better support UR_MEM_FLAG_USE_HOST_POINTER by using host-memory registration (CUDA/HIP) rather than falling back to an initial copy, and adjusts conformance expectations accordingly.

Changes:

Enable UR_MEM_FLAG_USE_HOST_POINTER paths for HIP and CUDA adapters (removing the previous “disabled” gating logic).
Refine “initial copy” logic so it only triggers for UR_MEM_FLAG_ALLOC_COPY_HOST_POINTER.
Update conformance test expectations by removing HIP/CUDA from the known-failing UseHostPointer test list.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
unified-runtime/test/conformance/memory/urMemBufferCreate.cpp	Removes HIP/CUDA from known failures for `UseHostPointer` conformance test.
unified-runtime/source/adapters/hip/memory.cpp	Switches `USE_HOST_POINTER` to `UseHostPtr` alloc mode and adjusts registration behavior/error handling.
unified-runtime/source/adapters/cuda/memory.cpp	Switches `USE_HOST_POINTER` to `UseHostPtr` alloc mode and adjusts registration error handling.

Comments suppressed due to low confidence (1)

unified-runtime/source/adapters/hip/memory.cpp:107

UR_MEM_FLAG_USE_HOST_POINTER now sets AllocMode::UseHostPtr, but urMemBufferCreate no longer registers the host memory. Since BufferMem::clear() unconditionally calls hipHostUnregister(HostPtr) for UseHostPtr, releasing a buffer that was never used on a device will attempt to unregister an unregistered pointer and can fail/crash. Register the host pointer in urMemBufferCreate when USE_HOST_POINTER is set, or track whether registration actually happened and only unregister in that case.

    auto HostPtr = pProperties ? pProperties->pHost : nullptr;
    BufferMem::AllocMode AllocMode = BufferMem::AllocMode::Classic;
    if (flags & UR_MEM_FLAG_USE_HOST_POINTER) {
      AllocMode = BufferMem::AllocMode::UseHostPtr;
    } else if (flags & UR_MEM_FLAG_ALLOC_HOST_POINTER) {
      UR_CHECK_ERROR(hipHostMalloc(&HostPtr, size));
      AllocMode = BufferMem::AllocMode::AllocHostPtr;
    } else if (flags & UR_MEM_FLAG_ALLOC_COPY_HOST_POINTER) {
      AllocMode = BufferMem::AllocMode::CopyIn;
    }

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

ldorau

Two issues left:

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

ldorau

LGTM

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

ldorau · 2026-04-16T10:57:27Z

Review please @intel/llvm-reviewers-cuda

ldorau · 2026-04-16T11:00:37Z

@kweronsx please update the description of this PR (it is empty now) #21725 (comment), because it will be used as the description of the final commit squashed from all commits from this PR.

bratpiorka · 2026-04-19T08:50:04Z

LGTM but please squash commits

…tration This PR fixes issue #9789 (#9789) where UR_MEM_FLAG_USE_HOST_POINTER was silently ignored in the CUDA and HIP adapters, causing the host data to be copied to the device instead of being mapped. What changed: Removed the dead EnableUseHostPtr = false flag in both cuda/memory.cpp and hip/memory.cpp. This flag was hardcoded to false with a TODO comment, meaning the USE_HOST_PTR path was never actually taken — the buffer always fell back to copying. Enabled the UseHostPtr allocation mode by properly entering that branch when UR_MEM_FLAG_USE_HOST_POINTER is set. The hipHostRegister / cuMemHostRegister call was moved from urMemBufferCreate into allocateMemObjOnDeviceIfNeeded (i.e. it is deferred to lazy device allocation), which is where it already lived. Added HostPtrRegisteredByUR ownership flag to BufferMem in both headers. This boolean is set to true only when UR itself performed the registration. It ensures cuMemHostUnregister / hipHostUnregister is only called by the owner, preventing a double-unregister. Handled already-registered pointers gracefully: CUDA_ERROR_HOST_MEMORY_ALREADY_REGISTERED / hipErrorHostMemoryAlreadyRegistered is now tolerated in allocateMemObjOnDeviceIfNeeded. This covers cases where the same host pointer is registered across multiple lazy device allocations (e.g. multi-device contexts). Conformance test update: CUDA and HIP are removed from the UseHostPointer known-failure list in urMemBufferCreate.cpp, reflecting that the feature now works correctly on those backends.

bratpiorka · 2026-04-22T18:35:00Z

@kswiecicki ready to merge

github-actions · 2026-04-22T18:39:18Z

@intel/llvm-gatekeepers please consider merging

kweronsx force-pushed the test/fix-cuda-use-host-pointer-buff branch 3 times, most recently from 8a2d0e3 to 6d27f2a Compare April 14, 2026 14:05

kweronsx marked this pull request as ready for review April 15, 2026 09:12

kweronsx requested review from a team as code owners April 15, 2026 09:12

kweronsx requested a review from kekaczma April 15, 2026 09:12

ldorau requested a review from Copilot April 15, 2026 12:06

Copilot started reviewing on behalf of ldorau April 15, 2026 12:08 View session

Copilot AI reviewed Apr 15, 2026

View reviewed changes

Comment thread unified-runtime/source/adapters/hip/memory.cpp

Comment thread unified-runtime/source/adapters/cuda/memory.cpp Outdated

Comment thread unified-runtime/source/adapters/cuda/memory.cpp

ldorau requested changes Apr 15, 2026

View reviewed changes

Comment thread unified-runtime/source/adapters/cuda/memory.cpp Outdated

kweronsx marked this pull request as draft April 15, 2026 12:25

kweronsx requested a review from ldorau April 15, 2026 13:39

kweronsx marked this pull request as ready for review April 15, 2026 13:39

ldorau requested a review from Copilot April 15, 2026 13:56

Copilot started reviewing on behalf of ldorau April 15, 2026 13:58 View session

Copilot AI reviewed Apr 15, 2026

View reviewed changes

Comment thread unified-runtime/source/adapters/cuda/memory.cpp Outdated

ldorau requested changes Apr 15, 2026

View reviewed changes

Comment thread unified-runtime/source/adapters/cuda/memory.cpp

Comment thread unified-runtime/source/adapters/hip/memory.hpp

Comment thread unified-runtime/source/adapters/cuda/memory.cpp Outdated

kweronsx marked this pull request as draft April 16, 2026 06:06

kweronsx marked this pull request as ready for review April 16, 2026 07:12

kweronsx requested a review from ldorau April 16, 2026 07:12

ldorau requested changes Apr 16, 2026

View reviewed changes

Comment thread unified-runtime/source/adapters/cuda/memory.cpp Outdated

Comment thread unified-runtime/source/adapters/cuda/memory.hpp Outdated

kweronsx marked this pull request as draft April 16, 2026 08:18

kweronsx requested a review from ldorau April 16, 2026 09:18

kweronsx marked this pull request as ready for review April 16, 2026 09:19

ldorau requested changes Apr 16, 2026

View reviewed changes

Comment thread unified-runtime/source/adapters/hip/memory.hpp

kweronsx marked this pull request as draft April 16, 2026 09:34

kweronsx marked this pull request as ready for review April 16, 2026 10:12

kweronsx requested a review from ldorau April 16, 2026 10:12

ldorau requested a review from Copilot April 16, 2026 10:16

Copilot started reviewing on behalf of ldorau April 16, 2026 10:17 View session

Copilot AI reviewed Apr 16, 2026

View reviewed changes

ldorau requested changes Apr 16, 2026

View reviewed changes

Comment thread unified-runtime/source/adapters/hip/memory.hpp

kweronsx marked this pull request as draft April 16, 2026 10:29

kweronsx marked this pull request as ready for review April 16, 2026 10:35

kweronsx requested a review from ldorau April 16, 2026 10:35

ldorau requested a review from Copilot April 16, 2026 10:38

ldorau approved these changes Apr 16, 2026

View reviewed changes

Copilot started reviewing on behalf of ldorau April 16, 2026 10:40 View session

Copilot AI reviewed Apr 16, 2026

View reviewed changes

bratpiorka approved these changes Apr 19, 2026

View reviewed changes

kweronsx force-pushed the test/fix-cuda-use-host-pointer-buff branch 2 times, most recently from 3879504 to 0be7c05 Compare April 21, 2026 10:13

KornevNikita merged commit d5f040b into sycl Apr 23, 2026
60 of 62 checks passed

kweronsx deleted the test/fix-cuda-use-host-pointer-buff branch April 23, 2026 10:26

Conversation

kweronsx commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kweronsx commented Apr 15, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ldorau left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

ldorau left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

ldorau commented Apr 16, 2026

Uh oh!

ldorau commented Apr 16, 2026

Uh oh!

bratpiorka commented Apr 19, 2026

Uh oh!

bratpiorka commented Apr 22, 2026

Uh oh!

github-actions Bot commented Apr 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

kweronsx commented Apr 10, 2026 •

edited

Loading