Skip to content

Misc. bug: Is this a backend VRAM tracking bug? #24159

@Diablo-D3

Description

@Diablo-D3

Name and Version

version: 9512 (0dbfa66)
built with Clang 19.1.5 for Windows x86_64

Operating systems

Windows

Which llama.cpp modules do you know to be affected?

llama-cli

Command line

llama-vulkan/llama-cli.exe -hf unsloth/Qwen3.6-27B-MTP-GGUF:Q5_K_XL -ngl all --fit on -lv 4 --no-mmap --fa on --cache-type-k x --cache-type-v x --spec-type draft-mtp --spec-draft-n-max 2 --spec-draft-type-k x --spec-draft-type-v x 

Problem description & steps to reproduce

When going through iterations of KV cache and draft KV cache quantization, VRAM usage is not consistent across different backends.

HIP:

kv cache  | draft kv  | context
f16/f16   | ntp       |   49408
q8_0/q8_0 | ntp       |   82944
f16/f16   | f16/f16   |   13312
f16/f16   | q8_0/q8_0 |    4608
q8_0/q8_0 | f16/f16   |   22784
q8_0/q8_0 | q8_0/q8_0 |    8192

Vulkan:

kv cache  | draft kv  | context
f16/f16   | ntp       |   39936
q8_0/q8_0 | ntp       |   74496
f16/f16   | f16/f16   |    8960
f16/f16   | q8_0/q8_0 |   16640
q8_0/q8_0 | f16/f16   |   17408
q8_0/q8_0 | q8_0/q8_0 |   31488 

According to #24102 (comment) the HIP numbers are correct, but the Vulkan numbers are not.

First Bad Commit

N/A

Relevant log output

N/A

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions