Skip to content

mtmd : add Apple CoreML backend for vision encoding#24163

Open
tc-mb wants to merge 4 commits into
ggml-org:masterfrom
tc-mb:coreml-mtmd-v2
Open

mtmd : add Apple CoreML backend for vision encoding#24163
tc-mb wants to merge 4 commits into
ggml-org:masterfrom
tc-mb:coreml-mtmd-v2

Conversation

@tc-mb
Copy link
Copy Markdown
Contributor

@tc-mb tc-mb commented Jun 5, 2026

Overview

Add an Apple CoreML backend to libmtmd for offloading vision encoder
inference to Apple Neural Engine (ANE) or GPU.

An initial version of this work was previously submitted as #15262. Since
then llama.cpp has restructured multimodal support into libmtmd, so
this PR is a ground-up rewrite that integrates CoreML as a first-class
backend within the new architecture.

The entire ViT + merger pipeline runs as a single compiled .mlmodelc
bundle — no per-op CoreML calls.

Changes from #15262:

  • Integrated into libmtmd via the adapter pattern
  • Metadata.json-driven model discovery — no hard-coded architecture strings
  • Includes an export tool (export_coreml.py) to convert HF checkpoints

Main changes

  • New GGML_COREML cmake option (Apple-only, OFF by default)
  • coreml/backend.{h,mm}: generic CoreML runtime — load / unload / predict_single_output
  • Adapter registry: coreml/models/ with detect / setup / encode_slice vtable
  • MiniCPM-V adapter (coreml/models/minicpmv.cpp): supports v4.0 / v4.5 / v4.6
  • mtmd-coreml.{h,cpp}: mtmd_coreml::context lifecycle, metadata parsing, adapter dispatch
  • Wired into mtmd_context via --coreml <path> CLI flag
  • Export tool (coreml/export_coreml.py): standalone Python script, HF checkpoint → .mlpackage
  • Shared SigLIP model definitions (coreml/models/modeling_siglip.py)

Input / output contract

Role Name Dtype Shape
Input pixel_values float32 [1, 3, 14, 14 × max_patches]
Input patch_w int32 [1]
Output output float32 [1, n_tokens, llm_embed_dim]

Adapters discover n_tokens and llm_embed_dim from the compiled model's
metadata.json at load time.

Testing

  • Exported MiniCPM-V 4.6 float32 .mlpackage and verified accuracy against PyTorch:
    • max diff: 2.80e-05, mean diff: 7.59e-07
  • Exported MiniCPM-V 4.6 float16 .mlpackage:
    • max diff: 6.37e-02, mean diff: 2.13e-03

Usage

# Build
cmake -B build -DGGML_COREML=ON
cmake --build build

# Export (Python, once)
pip install coremltools safetensors torch
python tools/mtmd/coreml/export_coreml.py \
    -m /path/to/MiniCPM-V-4.6 --precision float32

# Compile for runtime
xcrun coremlcompiler compile coreml_minicpmv46_vit_all_f32.mlpackage .

# Run
./build/bin/llama-mtmd-cli \
    -m MiniCPM-V-4_6-Q4_K_M.gguf \
    --coreml coreml_minicpmv46_vit_all_f32.mlmodelc \
    --image cat.jpg -p "Describe this image."

Signed-off-by: tc-mb <tianchi_cai@icloud.com>
Copy link
Copy Markdown
Contributor

@ngxson ngxson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please adapt your code to follow guidelines from https://github.com/ggml-org/llama.cpp/blob/master/AGENTS.md

Comment thread common/arg.cpp Outdated
Comment thread tools/mtmd/CMakeLists.txt Outdated
Comment thread tools/mtmd/coreml/mtmd-coreml.cpp
Comment thread tools/mtmd/coreml/mtmd-coreml.h
Comment thread tools/mtmd/coreml/models/minicpmv.cpp Outdated
// template (overview <image> ... </image> <slice> ... </slice> ...).
// The actual numeric variant (4 / 5 / 6) doesn't change slice layout
// in mtmd, so we use a single value for all bundles.
hp.minicpmv_version = 4;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not a future-proof solution. instead, it's better to check for substring coreml_minicpmv40_vit_f16 in {coreml_dir}/metadata.json, no json parsing is needed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's true that this piece is not well written enough. I'll think about how to design it more universally.

tc-mb added 3 commits June 5, 2026 17:59
…ml/ dir

Signed-off-by: tc-mb <tianchi_cai@icloud.com>
… hardcoding

Signed-off-by: tc-mb <tianchi_cai@icloud.com>
@tc-mb tc-mb force-pushed the coreml-mtmd-v2 branch from eaeb136 to 9c103c1 Compare June 5, 2026 10:28
@tc-mb
Copy link
Copy Markdown
Contributor Author

tc-mb commented Jun 5, 2026

@ngxson Thanks for the review.

One more thing I'd like your input on: export_coreml.py is currently tightly coupled to the MiniCPM-V pipeline — the model class, weight mapping, and metadata are all hardcoded for that one family. Before this PR lands, I'd like to at least add one additional model family (e.g. Qwen2VL or another ViT) to prove the dispatch pattern works and keep the script general enough.

Two options:

I add a per-model dispatch (--model minicpmv46 / --model qwen2vl etc.) in this PR, with at least one extra model to show it's not over-engineered
I land this PR as-is and refactor the export tool in a follow-up PR when the second model arrives
Which direction do you prefer?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

examples python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants