mtmd : add Apple CoreML backend for vision encoding by tc-mb · Pull Request #24163 · ggml-org/llama.cpp

tc-mb · 2026-06-05T08:06:00Z

Overview

Add an Apple CoreML backend to libmtmd for offloading vision encoder
inference to Apple Neural Engine (ANE) or GPU.

An initial version of this work was previously submitted as #15262. Since
then llama.cpp has restructured multimodal support into libmtmd, so
this PR is a ground-up rewrite that integrates CoreML as a first-class
backend within the new architecture.

The entire ViT + merger pipeline runs as a single compiled .mlmodelc
bundle — no per-op CoreML calls.

Changes from #15262:

Integrated into libmtmd via the adapter pattern
Metadata.json-driven model discovery — no hard-coded architecture strings
Includes an export tool (export_coreml.py) to convert HF checkpoints

Main changes

New GGML_COREML cmake option (Apple-only, OFF by default)
coreml/backend.{h,mm}: generic CoreML runtime — load / unload / predict_single_output
Adapter registry: coreml/models/ with detect / setup / encode_slice vtable
MiniCPM-V adapter (coreml/models/minicpmv.cpp): supports v4.0 / v4.5 / v4.6
mtmd-coreml.{h,cpp}: mtmd_coreml::context lifecycle, metadata parsing, adapter dispatch
Wired into mtmd_context via --coreml <path> CLI flag
Export tool (coreml/export_coreml.py): standalone Python script, HF checkpoint → .mlpackage
Shared SigLIP model definitions (coreml/models/modeling_siglip.py)

Input / output contract

Role	Name	Dtype	Shape
Input	`pixel_values`	float32	`[1, 3, 14, 14 × max_patches]`
Input	`patch_w`	int32	`[1]`
Output	`output`	float32	`[1, n_tokens, llm_embed_dim]`

Adapters discover n_tokens and llm_embed_dim from the compiled model's
metadata.json at load time.

Testing

Exported MiniCPM-V 4.6 float32 .mlpackage and verified accuracy against PyTorch:
- max diff: 2.80e-05, mean diff: 7.59e-07
Exported MiniCPM-V 4.6 float16 .mlpackage:
- max diff: 6.37e-02, mean diff: 2.13e-03

Usage

# Build
cmake -B build -DGGML_COREML=ON
cmake --build build

# Export (Python, once)
pip install coremltools safetensors torch
python tools/mtmd/coreml/export_coreml.py \
    -m /path/to/MiniCPM-V-4.6 --precision float32

# Compile for runtime
xcrun coremlcompiler compile coreml_minicpmv46_vit_all_f32.mlpackage .

# Run
./build/bin/llama-mtmd-cli \
    -m MiniCPM-V-4_6-Q4_K_M.gguf \
    --coreml coreml_minicpmv46_vit_all_f32.mlmodelc \
    --image cat.jpg -p "Describe this image."

Signed-off-by: tc-mb <tianchi_cai@icloud.com>

ngxson

please adapt your code to follow guidelines from https://github.com/ggml-org/llama.cpp/blob/master/AGENTS.md

ngxson · 2026-06-05T09:47:01Z

+    // template (overview <image> ... </image> <slice> ... </slice> ...).
+    // The actual numeric variant (4 / 5 / 6) doesn't change slice layout
+    // in mtmd, so we use a single value for all bundles.
+    hp.minicpmv_version   = 4;


this is not a future-proof solution. instead, it's better to check for substring coreml_minicpmv40_vit_f16 in {coreml_dir}/metadata.json, no json parsing is needed

Yes, it's true that this piece is not well written enough. I'll think about how to design it more universally.

…ml/ dir Signed-off-by: tc-mb <tianchi_cai@icloud.com>

Signed-off-by: tc-mb <tianchi_cai@icloud.com>

… hardcoding Signed-off-by: tc-mb <tianchi_cai@icloud.com>

tc-mb · 2026-06-05T10:35:02Z

@ngxson Thanks for the review.

One more thing I'd like your input on: export_coreml.py is currently tightly coupled to the MiniCPM-V pipeline — the model class, weight mapping, and metadata are all hardcoded for that one family. Before this PR lands, I'd like to at least add one additional model family (e.g. Qwen2VL or another ViT) to prove the dispatch pattern works and keep the script general enough.

Two options:

I add a per-model dispatch (--model minicpmv46 / --model qwen2vl etc.) in this PR, with at least one extra model to show it's not over-engineered
I land this PR as-is and refactor the export tool in a follow-up PR when the second model arrives
Which direction do you prefer?

mtmd : add Apple CoreML backend for vision encoding

e77b850

Signed-off-by: tc-mb <tianchi_cai@icloud.com>

tc-mb requested review from a team as code owners June 5, 2026 08:06

github-actions Bot added examples python python script changes labels Jun 5, 2026

tc-mb mentioned this pull request Jun 5, 2026

Apple NPU acceleration integrated into llama.cpp, using MiniCPM-V 4.0 as an example. #15262

Closed

ngxson reviewed Jun 5, 2026

View reviewed changes

tc-mb added 3 commits June 5, 2026 17:59

mtmd : rename GGML_COREML to MTMD_COREML, move coreml sources to core…

aca29ea

…ml/ dir Signed-off-by: tc-mb <tianchi_cai@icloud.com>

mtmd : drop --coreml flag, auto-detect CoreML bundle from --mmproj path

5a4c53f

Signed-off-by: tc-mb <tianchi_cai@icloud.com>

mtmd : derive minicpmv_version from CoreML bundle metadata instead of…

9c103c1

… hardcoding Signed-off-by: tc-mb <tianchi_cai@icloud.com>

tc-mb force-pushed the coreml-mtmd-v2 branch from eaeb136 to 9c103c1 Compare June 5, 2026 10:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mtmd : add Apple CoreML backend for vision encoding#24163

mtmd : add Apple CoreML backend for vision encoding#24163
tc-mb wants to merge 4 commits into
ggml-org:masterfrom
tc-mb:coreml-mtmd-v2

tc-mb commented Jun 5, 2026

Uh oh!

ngxson left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ngxson Jun 5, 2026

Uh oh!

tc-mb Jun 5, 2026

Uh oh!

tc-mb commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tc-mb commented Jun 5, 2026

Overview

Main changes

Input / output contract

Testing

Usage

Uh oh!

ngxson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ngxson Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

tc-mb Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

tc-mb commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants