[OpenVINO] Tensor element type mismatch (f32 != f16) when running Qwen2.5-Coder on NPU

## Environment
  - llama.cpp build: local build from source (no git tag)
  - Hardware: Intel Core Ultra 9 285H (Arrow Lake), no dGPU, Intel NPU
  - OS: Windows 11
  - OpenVINO: 2025.4
  - Model: Qwen2.5-Coder-7B-Instruct (GGUF, both Q4_K_M and Q4_0 tested)

  ## Build Configuration
  - GGML_OPENVINO=ON
  - GGML_OPENVINO_NO_OPENCL=ON (no OpenCL SDK available on the system)
  - Windows SDK + MSVC 19.50

  ## Bug Description
  Model loads successfully but fails during inference with:
  GGML OpenVINO backend ov::Exception: Exception from infer_request.cpp:224:
  Check 'dst->get_element_type() == get_element_type()' failed at itensor.cpp:79:
  Tensor element types are not equal. (src: f32 != dst: f16)

  This occurs with GGML_OPENVINO_DEVICE set to either NPU or CPU.

  ## Steps to Reproduce
  1. Build llama.cpp with GGML_OPENVINO=ON on Windows
  2. Download Qwen2.5-Coder-7B-Instruct Q4_K_M or Q4_0 GGUF
  3. Run: `llama-cli -m <model> --conversation`
  4. Send any prompt → Compute error with above exception

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OpenVINO] Tensor element type mismatch (f32 != f16) when running Qwen2.5-Coder on NPU #24191

Environment

Build Configuration

Bug Description

Steps to Reproduce

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[OpenVINO] Tensor element type mismatch (f32 != f16) when running Qwen2.5-Coder on NPU #24191

Description

Environment

Build Configuration

Bug Description

Steps to Reproduce

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions