Gemma4 MTP by am17an · Pull Request #17 · am17an/llama.cpp

am17an · 2026-05-19T15:56:42Z

Works with both gemma-31B and gemma-26B but the MoE model is slower. I see a good speed up on my DGX spark (~2-2.5x speedup) on the dense model. The main problem is sharing the memory ctx between the two llama_contexts, so currently it's pretty hacky plus also the ubatch splitting is not super clean.

Replicated the AIME-26 results for Gemma-31B with -np 4

am17an · 2026-05-19T16:58:01Z

+    // of streams (one per active draft seq); q->ne[2] is not divisible by the full
+    // n_stream and the view collapses tokens. Slice k/v down to exactly the streams
+    // referenced by this ubatch. Requires those streams to form a contiguous range.
+    if (k->ne[3] > 1 && (uint32_t) k->ne[3] != ubatch.n_seqs_unq) {


@ggerganov this part

ggml_backend_dev_by_name always appends a nullptr sentinel to the devices vector. Skipping nullptr entries prevents assertion failure in ggml_backend_dev_name. Assisted-by: llama.cpp:local pi

ggerganov · 2026-05-21T08:47:44Z

@am17an Are these AIME results with default thinking, or did you set a reasoning budget?

am17an · 2026-05-21T08:55:00Z

Just the default, no budget

am17an added 2 commits May 19, 2026 21:04

llama: Gemma 4 MTP

c5cf6a8

fix multi-seq

154eba0

github-actions Bot added examples python server model labels May 19, 2026

am17an commented May 19, 2026

View reviewed changes

am17an and others added 2 commits May 20, 2026 23:41

add assert that draft + shared kv should be on same device

a752e1b

common/speculative : fix nullptr crash in get_devices_str

a03120c

ggml_backend_dev_by_name always appends a nullptr sentinel to the devices vector. Skipping nullptr entries prevents assertion failure in ggml_backend_dev_name. Assisted-by: llama.cpp:local pi

am17an force-pushed the gemma4-mtp branch from cd2e5b2 to a03120c Compare May 20, 2026 16:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gemma4 MTP#17

Gemma4 MTP#17
am17an wants to merge 4 commits into
masterfrom
gemma4-mtp

am17an commented May 19, 2026 •

edited

Loading

Uh oh!

am17an May 19, 2026

Uh oh!

ggerganov commented May 21, 2026

Uh oh!

am17an commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

am17an commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

am17an May 19, 2026

Choose a reason for hiding this comment

Uh oh!

ggerganov commented May 21, 2026

Uh oh!

am17an commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

am17an commented May 19, 2026 •

edited

Loading