Name and Version
version: b9453
Operating systems
Windows, Linux Ubuntu
Which llama.cpp modules do you know to be affected?
CUDA/HIP module
Command line
N/A
Problem description
This issue led to this pr. As a result, the device field which determines if something is integrated or not is always set to false. This causes some issues because when a device has both an iGPU and a dGPU. Since HIP detects both devices as a non-integrated device, it allocates work to both of them as if they were both a dGPU.
I've traced the issue specifically leading to the function ggml_backend_cuda_device_supports_buft returning the wrong bool because the field is not set properly.
Steps to reproduce
- Turn on integrated graphics in BIOS
- Compile and run a model with llama.cpp using HIP backend
Let me know if a reproduction script is necessary.
Proposed and working change
Just add a guardrail where if llama.cpp is being compiled for HIP, then set it as prop.integrated.
// ggml/src/ggml-cuda/ggml-cuda.cu line 248
#if defined(GGML_USE_HIP)
info.devices[id].integrated = prop.integrated;
#else
info.devices[id].integrated = false; // Temporarily disabled due to issues with corrupted output (e.g. #15034)
#endif
Should any other information and clarifications be necessary, or if this change wouldn't work, please let me know. Just as a note, I built llama.cpp from source with the proposed change above and the fix worked. I can give a reproduction script if needed. I have tested this on both gfx1100 and gfx1201.
First Bad Commit
Not necessarily bad commit but not specific to HIP.
PR #16308
Relevant log output
You can reference the symptoms from the linked issues (lemonade-sdk/llamacpp-rocm#96 and ROCm/ROCm#6227).
Name and Version
version: b9453
Operating systems
Windows, Linux Ubuntu
Which llama.cpp modules do you know to be affected?
CUDA/HIP module
Command line
N/A
Problem description
This issue led to this pr. As a result, the device field which determines if something is integrated or not is always set to false. This causes some issues because when a device has both an iGPU and a dGPU. Since HIP detects both devices as a non-integrated device, it allocates work to both of them as if they were both a dGPU.
I've traced the issue specifically leading to the function
ggml_backend_cuda_device_supports_buftreturning the wrong bool because the field is not set properly.Steps to reproduce
Let me know if a reproduction script is necessary.
Proposed and working change
Just add a guardrail where if llama.cpp is being compiled for HIP, then set it as prop.integrated.
Should any other information and clarifications be necessary, or if this change wouldn't work, please let me know. Just as a note, I built llama.cpp from source with the proposed change above and the fix worked. I can give a reproduction script if needed. I have tested this on both gfx1100 and gfx1201.
First Bad Commit
Not necessarily bad commit but not specific to HIP.
PR #16308
Relevant log output
You can reference the symptoms from the linked issues (lemonade-sdk/llamacpp-rocm#96 and ROCm/ROCm#6227).