Environment
- llama.cpp build: local build from source (no git tag)
- Hardware: Intel Core Ultra 9 285H (Arrow Lake), no dGPU, Intel NPU
- OS: Windows 11
- OpenVINO: 2025.4
- Model: Qwen2.5-Coder-7B-Instruct (GGUF, both Q4_K_M and Q4_0 tested)
Build Configuration
- GGML_OPENVINO=ON
- GGML_OPENVINO_NO_OPENCL=ON (no OpenCL SDK available on the system)
- Windows SDK + MSVC 19.50
Bug Description
Model loads successfully but fails during inference with:
GGML OpenVINO backend ov::Exception: Exception from infer_request.cpp:224:
Check 'dst->get_element_type() == get_element_type()' failed at itensor.cpp:79:
Tensor element types are not equal. (src: f32 != dst: f16)
This occurs with GGML_OPENVINO_DEVICE set to either NPU or CPU.
Steps to Reproduce
- Build llama.cpp with GGML_OPENVINO=ON on Windows
- Download Qwen2.5-Coder-7B-Instruct Q4_K_M or Q4_0 GGUF
- Run:
llama-cli -m <model> --conversation
- Send any prompt → Compute error with above exception
Environment
Build Configuration
Bug Description
Model loads successfully but fails during inference with:
GGML OpenVINO backend ov::Exception: Exception from infer_request.cpp:224:
Check 'dst->get_element_type() == get_element_type()' failed at itensor.cpp:79:
Tensor element types are not equal. (src: f32 != dst: f16)
This occurs with GGML_OPENVINO_DEVICE set to either NPU or CPU.
Steps to Reproduce
llama-cli -m <model> --conversation