Conversation
8a2d0e3 to
6d27f2a
Compare
|
This PR started from an attempt to get the UseHostPointer conformance test in urMemBufferCreate to pass for CUDA and HIP. UR_MEM_FLAG_USE_HOST_POINTER had been silently treated as a copy operation in both adapters, hidden behind a hardcoded EnableUseHostPtr = false guard. The guard was accompanied by a comment citing a "weird segfault after program ends," but no such segfault was observed, so the dead code has been removed. Enabling the feature revealed one real issue: cuMemHostRegister/hipHostRegister is called once during buffer creation in urMemBufferCreate, and then again in allocateMemObjOnDeviceIfNeeded at kernel submission time, resulting in a double-registration. This is handled by tolerating CUDA_ERROR_HOST_MEMORY_ALREADY_REGISTERED/hipErrorHostMemoryAlreadyRegistered in allocateMemObjOnDeviceIfNeeded instead of propagating it as an error. |
There was a problem hiding this comment.
Pull request overview
This PR updates Unified Runtime buffer creation behavior to better support UR_MEM_FLAG_USE_HOST_POINTER by using host-memory registration (CUDA/HIP) rather than falling back to an initial copy, and adjusts conformance expectations accordingly.
Changes:
- Enable
UR_MEM_FLAG_USE_HOST_POINTERpaths for HIP and CUDA adapters (removing the previous “disabled” gating logic). - Refine “initial copy” logic so it only triggers for
UR_MEM_FLAG_ALLOC_COPY_HOST_POINTER. - Update conformance test expectations by removing HIP/CUDA from the known-failing
UseHostPointertest list.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| unified-runtime/test/conformance/memory/urMemBufferCreate.cpp | Removes HIP/CUDA from known failures for UseHostPointer conformance test. |
| unified-runtime/source/adapters/hip/memory.cpp | Switches USE_HOST_POINTER to UseHostPtr alloc mode and adjusts registration behavior/error handling. |
| unified-runtime/source/adapters/cuda/memory.cpp | Switches USE_HOST_POINTER to UseHostPtr alloc mode and adjusts registration error handling. |
Comments suppressed due to low confidence (1)
unified-runtime/source/adapters/hip/memory.cpp:107
UR_MEM_FLAG_USE_HOST_POINTERnow setsAllocMode::UseHostPtr, buturMemBufferCreateno longer registers the host memory. SinceBufferMem::clear()unconditionally callshipHostUnregister(HostPtr)forUseHostPtr, releasing a buffer that was never used on a device will attempt to unregister an unregistered pointer and can fail/crash. Register the host pointer inurMemBufferCreatewhenUSE_HOST_POINTERis set, or track whether registration actually happened and only unregister in that case.
auto HostPtr = pProperties ? pProperties->pHost : nullptr;
BufferMem::AllocMode AllocMode = BufferMem::AllocMode::Classic;
if (flags & UR_MEM_FLAG_USE_HOST_POINTER) {
AllocMode = BufferMem::AllocMode::UseHostPtr;
} else if (flags & UR_MEM_FLAG_ALLOC_HOST_POINTER) {
UR_CHECK_ERROR(hipHostMalloc(&HostPtr, size));
AllocMode = BufferMem::AllocMode::AllocHostPtr;
} else if (flags & UR_MEM_FLAG_ALLOC_COPY_HOST_POINTER) {
AllocMode = BufferMem::AllocMode::CopyIn;
}
|
Review please @intel/llvm-reviewers-cuda |
|
@kweronsx please update the description of this PR (it is empty now) #21725 (comment), because it will be used as the description of the final commit squashed from all commits from this PR. |
|
LGTM but please squash commits |
3879504 to
0be7c05
Compare
…tration This PR fixes issue #9789 (#9789) where UR_MEM_FLAG_USE_HOST_POINTER was silently ignored in the CUDA and HIP adapters, causing the host data to be copied to the device instead of being mapped. What changed: Removed the dead EnableUseHostPtr = false flag in both cuda/memory.cpp and hip/memory.cpp. This flag was hardcoded to false with a TODO comment, meaning the USE_HOST_PTR path was never actually taken — the buffer always fell back to copying. Enabled the UseHostPtr allocation mode by properly entering that branch when UR_MEM_FLAG_USE_HOST_POINTER is set. The hipHostRegister / cuMemHostRegister call was moved from urMemBufferCreate into allocateMemObjOnDeviceIfNeeded (i.e. it is deferred to lazy device allocation), which is where it already lived. Added HostPtrRegisteredByUR ownership flag to BufferMem in both headers. This boolean is set to true only when UR itself performed the registration. It ensures cuMemHostUnregister / hipHostUnregister is only called by the owner, preventing a double-unregister. Handled already-registered pointers gracefully: CUDA_ERROR_HOST_MEMORY_ALREADY_REGISTERED / hipErrorHostMemoryAlreadyRegistered is now tolerated in allocateMemObjOnDeviceIfNeeded. This covers cases where the same host pointer is registered across multiple lazy device allocations (e.g. multi-device contexts). Conformance test update: CUDA and HIP are removed from the UseHostPointer known-failure list in urMemBufferCreate.cpp, reflecting that the feature now works correctly on those backends.
|
@kswiecicki ready to merge |
|
@intel/llvm-gatekeepers please consider merging |
This PR fixes a long-standing issue (#9789) where UR_MEM_FLAG_USE_HOST_POINTER was silently ignored in the CUDA and HIP adapters, causing the host data to be copied to the device instead of being mapped.
What changed:
Removed the dead EnableUseHostPtr = false flag in both cuda/memory.cpp and hip/memory.cpp. This flag was hardcoded to false with a TODO comment, meaning the USE_HOST_PTR path was never actually taken — the buffer always fell back to copying.
Enabled the UseHostPtr allocation mode by properly entering that branch when UR_MEM_FLAG_USE_HOST_POINTER is set. The hipHostRegister / cuMemHostRegister call was moved from urMemBufferCreate into allocateMemObjOnDeviceIfNeeded (i.e. it is deferred to lazy device allocation), which is where it already lived.
Added HostPtrRegisteredByUR ownership flag to BufferMem in both headers. This boolean is set to true only when UR itself performed the registration. It ensures cuMemHostUnregister / hipHostUnregister is only called by the owner, preventing a double-unregister.
Handled already-registered pointers gracefully: CUDA_ERROR_HOST_MEMORY_ALREADY_REGISTERED / hipErrorHostMemoryAlreadyRegistered is now tolerated in allocateMemObjOnDeviceIfNeeded. This covers cases where the same host pointer is registered across multiple lazy device allocations (e.g. multi-device contexts).
Explicit copy constructors were added to BufferMem in both adapters to deliberately not copy HostPtrRegisteredByUR, making ownership semantics clear — only the original object is responsible for unregistering.
Conformance test update: CUDA and HIP are removed from the UseHostPointer known-failure list in urMemBufferCreate.cpp, reflecting that the feature now works correctly on those backends.