Skip to content

[OpenVINO] Tensor element type mismatch (f32 != f16) when running Qwen2.5-Coder on NPU #24191

@ZijinWukll

Description

@ZijinWukll

Environment

  • llama.cpp build: local build from source (no git tag)
  • Hardware: Intel Core Ultra 9 285H (Arrow Lake), no dGPU, Intel NPU
  • OS: Windows 11
  • OpenVINO: 2025.4
  • Model: Qwen2.5-Coder-7B-Instruct (GGUF, both Q4_K_M and Q4_0 tested)

Build Configuration

  • GGML_OPENVINO=ON
  • GGML_OPENVINO_NO_OPENCL=ON (no OpenCL SDK available on the system)
  • Windows SDK + MSVC 19.50

Bug Description

Model loads successfully but fails during inference with:
GGML OpenVINO backend ov::Exception: Exception from infer_request.cpp:224:
Check 'dst->get_element_type() == get_element_type()' failed at itensor.cpp:79:
Tensor element types are not equal. (src: f32 != dst: f16)

This occurs with GGML_OPENVINO_DEVICE set to either NPU or CPU.

Steps to Reproduce

  1. Build llama.cpp with GGML_OPENVINO=ON on Windows
  2. Download Qwen2.5-Coder-7B-Instruct Q4_K_M or Q4_0 GGUF
  3. Run: llama-cli -m <model> --conversation
  4. Send any prompt → Compute error with above exception

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions