Update dependency ggml-org/llama.cpp to v9566 by renovate[bot] · Pull Request #222 · henrywang/lux

renovate · 2026-06-08T18:52:34Z

ℹ️ Note

This PR body was truncated due to platform limits.

This PR contains the following updates:

Package	Update	Change
ggml-org/llama.cpp	major	`b9066` → `b9566`

Release Notes

ggml-org/llama.cpp (ggml-org/llama.cpp)

`vb9566`

Compare Source

Details

graph: guard iswa kq_mask on its own buffer (#24294)

A SWA-only draft head (e.g. StepFun MTP) leaves the base sub-cache
empty, so its kq_mask buffer stays null and asserts at load. Guard
each mask on its own buffer in set_input and can_reuse, base and swa.

Co-authored-by: Georgi Gerganov ggerganov@gmail.com

macOS/iOS:

macOS Apple Silicon (arm64)
macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED
macOS Intel (x64)
iOS XCFramework

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

DISABLED
openEuler x86 (310p)
openEuler x86 (910b, ACL Graph)
openEuler aarch64 (310p)
openEuler aarch64 (910b, ACL Graph)

UI:

UI

`vb9565`

Compare Source

Details

[ggml-webgpu] Handle buffer overlap / buffer aliasing for concat operator (#24000)

Only run webgpu CI on my fork
Add webgpu only workflow
handle buffer overlap case for concat operator
restore build-webgpu.yml

Co-Authored-By: Claude Sonnet 4.6 noreply@anthropic.com

Run clang-format
Update ggml/src/ggml-webgpu/wgsl-shaders/concat.wgsl

Co-authored-by: Claude Sonnet 4.6 noreply@anthropic.com
Co-authored-by: Reese Levine <reeselevine1@gmail.com>

macOS/iOS:

macOS Apple Silicon (arm64)
macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED
macOS Intel (x64)
iOS XCFramework

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

DISABLED
openEuler x86 (310p)
openEuler x86 (910b, ACL Graph)
openEuler aarch64 (310p)
openEuler aarch64 (910b, ACL Graph)

UI:

UI

`vb9564`

Compare Source

Details

[ggml-webgpu] Implement 2D workgroups for scale, binary, and unary ops (#24044)

Only run webgpu CI on my fork
Add webgpu only workflow
Implement 2d workgroups for more operations
fix
Fix type
Move back to global_invocation_id

macOS/iOS:

macOS Apple Silicon (arm64)
macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED
macOS Intel (x64)
iOS XCFramework

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

DISABLED
openEuler x86 (310p)
openEuler x86 (910b, ACL Graph)
openEuler aarch64 (310p)
openEuler aarch64 (910b, ACL Graph)

UI:

UI

`vb9562`

Compare Source

Details

mtmd : add video input support (#24269)

wip
ok: lazy bitmap API
remember to free lazy text
wip
add mtmd_helper_video
support video input on server (base64 input)
add MTMD_VIDEO config
add timestamp
update CLI
cli: allow auto-completion for video
add --video arg
fix build
update docs
rename as suggested

macOS/iOS:

macOS Apple Silicon (arm64)
macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED
macOS Intel (x64)
iOS XCFramework

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

DISABLED
openEuler x86 (310p)
openEuler x86 (910b, ACL Graph)
openEuler aarch64 (310p)
openEuler aarch64 (910b, ACL Graph)

UI:

UI

`vb9561`

Compare Source

Details

sync : ggml

macOS/iOS:

macOS Apple Silicon (arm64)
macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED
macOS Intel (x64)
iOS XCFramework

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

DISABLED
openEuler x86 (310p)
openEuler x86 (910b, ACL Graph)
openEuler aarch64 (310p)
openEuler aarch64 (910b, ACL Graph)

UI:

UI

`vb9559`

Compare Source

Details

cli: fix spinner not show during prompt processing (#24283)

macOS/iOS:

macOS Apple Silicon (arm64)
macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED
macOS Intel (x64)
iOS XCFramework

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

DISABLED
openEuler x86 (310p)
openEuler x86 (910b, ACL Graph)
openEuler aarch64 (310p)
openEuler aarch64 (910b, ACL Graph)

UI:

UI

`vb9558`

Compare Source

Details

vulkan: Use cm2 decode_vector for mul_mat_id B matrix loads (#23991)

This allows vec4 loads of the B elements. Also increase BK to 64 when this is
enabled. Neither of these alone is consistently faster, but together these give
a nice speedup.

In ggml-vulkan.cpp, we need to make sure the B matrix alignment and stride are
multiples of 4.

macOS/iOS:

macOS Apple Silicon (arm64)
macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED
macOS Intel (x64)
iOS XCFramework

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

DISABLED
openEuler x86 (310p)
openEuler x86 (910b, ACL Graph)
openEuler aarch64 (310p)
openEuler aarch64 (910b, ACL Graph)

UI:

UI

`vb9557`

Compare Source

Details

cuda: reset cuda context after reading memory size (#23935)

cuda: reset device in get_memory function if no backend is active
also count device and host buffers
exclude hip and musa from counting and device reset
use device mutex instead of atomic
undo backend_free function move

macOS/iOS:

macOS Apple Silicon (arm64)
macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED
macOS Intel (x64)
iOS XCFramework

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

DISABLED
openEuler x86 (310p)
openEuler x86 (910b, ACL Graph)
openEuler aarch64 (310p)
openEuler aarch64 (910b, ACL Graph)

UI:

UI

`vb9556`

Compare Source

Details

HIP: add gfx1152 and gfx1153 to RDNA3.5 (#24129)

macOS/iOS:

macOS Apple Silicon (arm64)
macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED
macOS Intel (x64)
iOS XCFramework

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

DISABLED
openEuler x86 (310p)
openEuler x86 (910b, ACL Graph)
openEuler aarch64 (310p)
openEuler aarch64 (910b, ACL Graph)

UI:

UI

`vb9555`

Compare Source

Details

metal : fix im2col 1D case (audio models) (#24220)

macOS/iOS:

macOS Apple Silicon (arm64)
macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED
macOS Intel (x64)
iOS XCFramework

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

DISABLED
openEuler x86 (310p)
openEuler x86 (910b, ACL Graph)
openEuler aarch64 (310p)
openEuler aarch64 (910b, ACL Graph)

UI:

UI

`vb9553`

Compare Source

Details

common : relax sampler name matching (#23744)

common : relax sampler name matching

Currently, in some cases, the alternative names for samplers (like
top-k and min-p instead of the canonical top_k and min_p) are
not always recognized by the common_sampler_types_from_names function
in common/sampling.cpp.

This PR changes the signature of this function to remove the bool allow_alt_names flag, and removes all occurences of the flag from call
sites. Therefore, the function will now always match all known names.

I also changed the logic of the function to unconditionally check the
provided sampler names against both the canonical and alternative names,
and to be case-insensitive.

This fixes an issue I was seeing wherein samplers specified in the
llama-server UI were not recognized as valid when the alternative
names were used.

add more alt names
cont. fix
cast to unsigned char for correctness
common : unify sampler name mapping
annotate canonical vs. alt sampler name mappings per @CISC
Update common/sampling.cpp

Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com

common : auto-generate sampler name aliases per @ngxson
use merged map for matching
use .merge instead of iterating
nit: simplify comment
nit: use insert everywhere, not index assignment

Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com

macOS/iOS:

macOS Apple Silicon (arm64)
macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED
macOS Intel (x64)
iOS XCFramework

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

DISABLED
openEuler x86 (310p)
openEuler x86 (910b, ACL Graph)
openEuler aarch64 (310p)
openEuler aarch64 (910b, ACL Graph)

UI:

UI

`vb9551`

Compare Source

Details

kv-cache : avoid kv cells copies (#24277)

macOS/iOS:

macOS Apple Silicon (arm64)
macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED
macOS Intel (x64)
iOS XCFramework

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

DISABLED
openEuler x86 (310p)
openEuler x86 (910b, ACL Graph)
openEuler aarch64 (310p)
openEuler aarch64 (910b, ACL Graph)

UI:

UI

`vb9550`

Compare Source

Details

kv-cache: follow the source cache size when sharing cells (#24267)

A fitted target context can end up smaller than the draft default, the
oversized assistant views then overflow the shared K/V tensors and trip
the ggml_view_4d size assert during graph reserve.

macOS/iOS: