Skip to content

Update dependency ggml-org/llama.cpp to v9542 - autoclosed#215

Closed
renovate[bot] wants to merge 1 commit into
mainfrom
renovate/ggml-org-llama.cpp-9542.x
Closed

Update dependency ggml-org/llama.cpp to v9542 - autoclosed#215
renovate[bot] wants to merge 1 commit into
mainfrom
renovate/ggml-org-llama.cpp-9542.x

Conversation

@renovate

@renovate renovate Bot commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

ℹ️ Note

This PR body was truncated due to platform limits.

This PR contains the following updates:

Package Update Change
ggml-org/llama.cpp major b9066b9542

Release Notes

ggml-org/llama.cpp (ggml-org/llama.cpp)

vb9542

Compare Source

Details

completion : remove useless statics (#​24226)

Signed-off-by: Adrien Gallouët angt@huggingface.co

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

  • DISABLED
  • openEuler x86 (310p)
  • openEuler x86 (910b, ACL Graph)
  • openEuler aarch64 (310p)
  • openEuler aarch64 (910b, ACL Graph)

UI:

vb9541

Compare Source

Details

completion : fix format specifier in LOG_INF (#​24213)

Signed-off-by: Adrien Gallouët angt@huggingface.co

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

  • DISABLED
  • openEuler x86 (310p)
  • openEuler x86 (910b, ACL Graph)
  • openEuler aarch64 (310p)
  • openEuler aarch64 (910b, ACL Graph)

UI:

vb9538

Compare Source

Details

model : rename local n_layer_all variable (#​24209)

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

  • DISABLED
  • openEuler x86 (310p)
  • openEuler x86 (910b, ACL Graph)
  • openEuler aarch64 (310p)
  • openEuler aarch64 (910b, ACL Graph)

UI:

vb9537

Compare Source

Details

context : fix off-by-one comparisons to n_gpu_layers (#​24208)

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

  • DISABLED
  • openEuler x86 (310p)
  • openEuler x86 (910b, ACL Graph)
  • openEuler aarch64 (310p)
  • openEuler aarch64 (910b, ACL Graph)

UI:

vb9536

Compare Source

Details

opencl: improve get_rows, cpy, concat and q6_k flat gemv (#​24160)

  • opencl: allow multiple workgroups for large rows

  • opencl: improve small cpy

  • opencl: packed concat for small input

  • opencl: tweak flat q6_K gemv, increase N_DST and remap threads

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

  • DISABLED
  • openEuler x86 (310p)
  • openEuler x86 (910b, ACL Graph)
  • openEuler aarch64 (310p)
  • openEuler aarch64 (910b, ACL Graph)

UI:

vb9535

Compare Source

Details

common/chat : unify and fix LFM2/LFM2.5 tool parser (#​24178)

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

  • DISABLED
  • openEuler x86 (310p)
  • openEuler x86 (910b, ACL Graph)
  • openEuler aarch64 (310p)
  • openEuler aarch64 (910b, ACL Graph)

UI:

vb9534

Compare Source

Details

vulkan: add fwht support for Intel with shmem reduction (#​23964)

  • vulkan: add fwht support for Intel with shmem reduction

  • don't use N as workgroup size

  • disable subgroup shuffle on MoltenVK AMD

  • disable fwht shader on Intel Windows due to driver bug

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

  • DISABLED
  • openEuler x86 (310p)
  • openEuler x86 (910b, ACL Graph)
  • openEuler aarch64 (310p)
  • openEuler aarch64 (910b, ACL Graph)

UI:

vb9533

Compare Source

Details

model: fix build failed (#​24193)

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

  • DISABLED
  • openEuler x86 (310p)
  • openEuler x86 (910b, ACL Graph)
  • openEuler aarch64 (310p)
  • openEuler aarch64 (910b, ACL Graph)

UI:

vb9531

Compare Source

Details

TP: round up granularity to 128 (#​24180)

  • TP: round up granularity to 128

  • remove assert

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

  • DISABLED
  • openEuler x86 (310p)
  • openEuler x86 (910b, ACL Graph)
  • openEuler aarch64 (310p)
  • openEuler aarch64 (910b, ACL Graph)

UI:

vb9530

Compare Source

Details

cli: fix model params not propagated (#​23893)

Fixes #​23847

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

  • DISABLED
  • openEuler x86 (310p)
  • openEuler x86 (910b, ACL Graph)
  • openEuler aarch64 (310p)
  • openEuler aarch64 (910b, ACL Graph)

UI:

vb9529

Compare Source

Details

model : fix llama_model::n_gpu_layers() (#​24188)

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

  • DISABLED
  • openEuler x86 (310p)
  • openEuler x86 (910b, ACL Graph)
  • openEuler aarch64 (310p)
  • openEuler aarch64 (910b, ACL Graph)

UI:

vb9528

Compare Source

Details

ui: run npm install when package-lock.json is newer than node_modules (#​24171)

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

  • DISABLED
  • openEuler x86 (310p)
  • openEuler x86 (910b, ACL Graph)
  • openEuler aarch64 (310p)
  • openEuler aarch64 (910b, ACL Graph)

UI:

vb9524

Compare Source

Details

minor : fix lint issues (#​24165)

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

  • DISABLED
  • openEuler x86 (310p)
  • openEuler x86 (910b, ACL Graph)
  • openEuler aarch64 (310p)
  • openEuler aarch64 (910b, ACL Graph)

UI:

vb9523

Compare Source

Details

hparams : refactor hparams.n_layer (#​24060)

  • hparams : refactor hparams.n_layer

  • cont : remove n_layer_kv(), use n_layer_all instead

  • cont : type consistency

  • pi : update SYSTEM.md

  • models : fix Step3.5 MTP

  • cont : remove duplicate switch cases

  • cont : explicitly set false to extra layers for is_swa and is_recr

  • cont : fix nextn layer count handling

Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com


Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

  • DISABLED
  • openEuler x86 (310p)
  • openEuler x86 (910b, ACL Graph)
  • openEuler aarch64 (310p)
  • openEuler aarch64 (910b, ACL Graph)

UI:

vb9522

Compare Source

Details

kleidiai : dynamic chunck-based scheduling for hybrid execution (#​23819)

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

  • DISABLED
  • openEuler x86 (310p)
  • openEuler x86 (910b, ACL Graph)
  • openEuler aarch64 (310p)
  • openEuler aarch64 (910b, ACL Graph)

UI:

vb9521

Compare Source

Details

CUDA: enroll mul_mat_vec_q_moe into pdl (#​24087)

  • Enroll mul_mat_vec_q_moe into PDL, boosting MTP performance on BW

Data collected on a B4500:

Before

(llama.cpp) ➜  llama.cpp git:(master) ✗ python mtp-bench.py
  code_python        pred= 192 draft= 150 acc= 116 rate=0.773 tok/s=202.8
  code_cpp           pred= 192 draft= 147 acc= 117 rate=0.796 tok/s=212.8
  explain_concept    pred= 192 draft= 161 acc= 110 rate=0.683 tok/s=196.4
  summarize          pred= 192 draft= 138 acc= 122 rate=0.884 tok/s=226.6
  qa_factual         pred= 192 draft= 138 acc= 121 rate=0.877 tok/s=225.1
  translation        pred= 192 draft= 158 acc= 112 rate=0.709 tok/s=201.5
  creative_short     pred= 192 draft= 160 acc= 110 rate=0.688 tok/s=197.2
  stepwise_math      pred= 192 draft= 150 acc= 115 rate=0.767 tok/s=209.2
  long_code_review   pred= 192 draft= 148 acc= 116 rate=0.784 tok/s=208.9

After

(llama.cpp) ➜  llama.cpp git:(master) ✗ python mtp-bench.py
  code_python        pred= 192 draft= 150 acc= 116 rate=0.773 tok/s=211.9
  code_cpp           pred= 192 draft= 147 acc= 117 rate=0.796 tok/s=224.6
  explain_concept    pred= 192 draft= 161 acc= 110 rate=0.683 tok/s=207.8
  summarize          pred= 192 draft= 138 acc= 122 rate=0.884 tok/s=240.2
  qa_factual         pred= 192 draft= 138 acc= 121 rate=0.877 tok/s=238.5
  translation        pred= 192 draft= 158 acc= 112 rate=0.709 tok/s=213.4
  creative_short     pred= 192 draft= 160 acc= 110 rate=0.688 tok/s=208.8
  stepwise_math      pred= 192 draft= 150 acc= 115 rate=0.767 tok/s=221.7
  long_code_review   pred= 192 draft= 148 acc= 116 rate=0.784 tok/s=220.7

Server launched with:

➜  llama.cpp git:(osimons/enroll_mul_mat_vec_q_moe_into_PDL) ✗ ./build-x64-linux-gcc-reldbg/bin/llama-server \
    -m /mnt/share/gguf/unsloth/Qwen3.6-35B-A3B-MTP-GGUF/Qwen3.6-35B-A3B-UD-Q4_K_M.gguf -dio \
    --spec-type draft-mtp \
    --spec-draft-n-max 2 \
    -ngl all \
    -fa on \
    --host 0.0.0.0 \
    --port 8080 -np 1 --chat-template-kwargs "{\"preserve_thinking\": true}"
  • LC to overlap with following kernels

macOS/iOS:

Note

PR body was truncated to here.


Configuration

📅 Schedule: (UTC)

  • Branch creation
    • At any time (no schedule defined)
  • Automerge
    • At any time (no schedule defined)

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

@renovate renovate Bot changed the title Update dependency ggml-org/llama.cpp to v9542 Update dependency ggml-org/llama.cpp to v9542 - autoclosed Jun 6, 2026
@renovate renovate Bot closed this Jun 6, 2026
@renovate renovate Bot deleted the renovate/ggml-org-llama.cpp-9542.x branch June 6, 2026 21:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants