Update dependency ggml-org/llama.cpp to v9566#222
Open
renovate[bot] wants to merge 1 commit into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
b9066→b9566Release Notes
ggml-org/llama.cpp (ggml-org/llama.cpp)
vb9566Compare Source
Details
graph: guard iswa kq_mask on its own buffer (#24294)
A SWA-only draft head (e.g. StepFun MTP) leaves the base sub-cache
empty, so its kq_mask buffer stays null and asserts at load. Guard
each mask on its own buffer in set_input and can_reuse, base and swa.
Co-authored-by: Georgi Gerganov ggerganov@gmail.com
macOS/iOS:
Linux:
Android:
Windows:
openEuler:
UI:
vb9565Compare Source
Details
[ggml-webgpu] Handle buffer overlap / buffer aliasing for concat operator (#24000)
Only run webgpu CI on my fork
Add webgpu only workflow
handle buffer overlap case for concat operator
restore build-webgpu.yml
Co-Authored-By: Claude Sonnet 4.6 noreply@anthropic.com
Run clang-format
Update ggml/src/ggml-webgpu/wgsl-shaders/concat.wgsl
Co-authored-by: Claude Sonnet 4.6 noreply@anthropic.com
Co-authored-by: Reese Levine <reeselevine1@gmail.com>
macOS/iOS:
Linux:
Android:
Windows:
openEuler:
UI:
vb9564Compare Source
Details
[ggml-webgpu] Implement 2D workgroups for scale, binary, and unary ops (#24044)
Only run webgpu CI on my fork
Add webgpu only workflow
Implement 2d workgroups for more operations
fix
Fix type
Move back to global_invocation_id
macOS/iOS:
Linux:
Android:
Windows:
openEuler:
UI:
vb9562Compare Source
Details
mtmd : add video input support (#24269)
wip
ok: lazy bitmap API
remember to free lazy text
wip
add mtmd_helper_video
support video input on server (base64 input)
add MTMD_VIDEO config
add timestamp
update CLI
cli: allow auto-completion for video
add --video arg
fix build
update docs
rename as suggested
macOS/iOS:
Linux:
Android:
Windows:
openEuler:
UI:
vb9561Compare Source
Details
sync : ggml
macOS/iOS:
Linux:
Android:
Windows:
openEuler:
UI:
vb9559Compare Source
Details
cli: fix spinner not show during prompt processing (#24283)
macOS/iOS:
Linux:
Android:
Windows:
openEuler:
UI:
vb9558Compare Source
Details
vulkan: Use cm2 decode_vector for mul_mat_id B matrix loads (#23991)
This allows vec4 loads of the B elements. Also increase BK to 64 when this is
enabled. Neither of these alone is consistently faster, but together these give
a nice speedup.
In ggml-vulkan.cpp, we need to make sure the B matrix alignment and stride are
multiples of 4.
macOS/iOS:
Linux:
Android:
Windows:
openEuler:
UI:
vb9557Compare Source
Details
cuda: reset cuda context after reading memory size (#23935)
cuda: reset device in get_memory function if no backend is active
also count device and host buffers
exclude hip and musa from counting and device reset
use device mutex instead of atomic
undo backend_free function move
macOS/iOS:
Linux:
Android:
Windows:
openEuler:
UI:
vb9556Compare Source
Details
HIP: add gfx1152 and gfx1153 to RDNA3.5 (#24129)
macOS/iOS:
Linux:
Android:
Windows:
openEuler:
UI:
vb9555Compare Source
Details
metal : fix im2col 1D case (audio models) (#24220)
macOS/iOS:
Linux:
Android:
Windows:
openEuler:
UI:
vb9553Compare Source
Details
common : relax sampler name matching (#23744)
Currently, in some cases, the alternative names for samplers (like
top-kandmin-pinstead of the canonicaltop_kandmin_p) arenot always recognized by the
common_sampler_types_from_namesfunctionin
common/sampling.cpp.This PR changes the signature of this function to remove the
bool allow_alt_namesflag, and removes all occurences of the flag from callsites. Therefore, the function will now always match all known names.
I also changed the logic of the function to unconditionally check the
provided sampler names against both the canonical and alternative names,
and to be case-insensitive.
This fixes an issue I was seeing wherein samplers specified in the
llama-serverUI were not recognized as valid when the alternativenames were used.
add more alt names
cont. fix
cast to unsigned char for correctness
common : unify sampler name mapping
annotate canonical vs. alt sampler name mappings per @CISC
Update common/sampling.cpp
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
common : auto-generate sampler name aliases per @ngxson
use merged map for matching
use
.mergeinstead of iteratingnit: simplify comment
nit: use insert everywhere, not index assignment
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
macOS/iOS:
Linux:
Android:
Windows:
openEuler:
UI:
vb9551Compare Source
Details
kv-cache : avoid kv cells copies (#24277)
macOS/iOS:
Linux:
Android:
Windows:
openEuler:
UI:
vb9550Compare Source
Details
kv-cache: follow the source cache size when sharing cells (#24267)
A fitted target context can end up smaller than the draft default, the
oversized assistant views then overflow the shared K/V tensors and trip
the ggml_view_4d size assert during graph reserve.
macOS/iOS:
Linux:
Android:
Windows:
openEuler:
UI:
vb9549Compare Source
Details
llama : add Gemma4 MTP (#23398)
macOS/iOS:
Linux:
Android:
Windows:
openEuler:
UI:
vb9548Compare Source
Details
spec : fix vocab compatibility check (#24256)
macOS/iOS:
Linux:
Android:
Windows:
openEuler:
UI:
vb9547Compare Source
Details
arg: Skip mmproj download when user supplied mmproj (#24239)
macOS/iOS:
Configuration
📅 Schedule: (UTC)
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR was generated by Mend Renovate. View the repository job log.