mtmd : add Apple CoreML backend for vision encoding#24163
Conversation
Signed-off-by: tc-mb <tianchi_cai@icloud.com>
ngxson
left a comment
There was a problem hiding this comment.
please adapt your code to follow guidelines from https://github.com/ggml-org/llama.cpp/blob/master/AGENTS.md
| // template (overview <image> ... </image> <slice> ... </slice> ...). | ||
| // The actual numeric variant (4 / 5 / 6) doesn't change slice layout | ||
| // in mtmd, so we use a single value for all bundles. | ||
| hp.minicpmv_version = 4; |
There was a problem hiding this comment.
this is not a future-proof solution. instead, it's better to check for substring coreml_minicpmv40_vit_f16 in {coreml_dir}/metadata.json, no json parsing is needed
There was a problem hiding this comment.
Yes, it's true that this piece is not well written enough. I'll think about how to design it more universally.
…ml/ dir Signed-off-by: tc-mb <tianchi_cai@icloud.com>
Signed-off-by: tc-mb <tianchi_cai@icloud.com>
… hardcoding Signed-off-by: tc-mb <tianchi_cai@icloud.com>
|
@ngxson Thanks for the review. One more thing I'd like your input on: export_coreml.py is currently tightly coupled to the MiniCPM-V pipeline — the model class, weight mapping, and metadata are all hardcoded for that one family. Before this PR lands, I'd like to at least add one additional model family (e.g. Qwen2VL or another ViT) to prove the dispatch pattern works and keep the script general enough. Two options: I add a per-model dispatch (--model minicpmv46 / --model qwen2vl etc.) in this PR, with at least one extra model to show it's not over-engineered |
Overview
Add an Apple CoreML backend to
libmtmdfor offloading vision encoderinference to Apple Neural Engine (ANE) or GPU.
An initial version of this work was previously submitted as #15262. Since
then
llama.cpphas restructured multimodal support intolibmtmd, sothis PR is a ground-up rewrite that integrates CoreML as a first-class
backend within the new architecture.
The entire ViT + merger pipeline runs as a single compiled
.mlmodelcbundle — no per-op CoreML calls.
Changes from #15262:
libmtmdvia the adapter patternexport_coreml.py) to convert HF checkpointsMain changes
GGML_COREMLcmake option (Apple-only,OFFby default)coreml/backend.{h,mm}: generic CoreML runtime —load/unload/predict_single_outputcoreml/models/withdetect/setup/encode_slicevtablecoreml/models/minicpmv.cpp): supports v4.0 / v4.5 / v4.6mtmd-coreml.{h,cpp}:mtmd_coreml::contextlifecycle, metadata parsing, adapter dispatchmtmd_contextvia--coreml <path>CLI flagcoreml/export_coreml.py): standalone Python script, HF checkpoint →.mlpackagecoreml/models/modeling_siglip.py)Input / output contract
pixel_values[1, 3, 14, 14 × max_patches]patch_w[1]output[1, n_tokens, llm_embed_dim]Adapters discover
n_tokensandllm_embed_dimfrom the compiled model'smetadata.jsonat load time.Testing
.mlpackageand verified accuracy against PyTorch:.mlpackage:Usage