Finding (2026-06-14)
Evaluated moonshine-ai/moonshine as a candidate on-device engine. Verdict: a good candidate — but for live partials + tiny footprint, not as another accuracy rung. Worth adding behind the existing spine; the cost is a second inference runtime.
What it is
- Streaming-first ASR family, MIT-licensed, designed for edge from day one. English Tiny (26M, 12.66% WER) → Base (58M) → Small Streaming (123M) → Medium Streaming (245M, 6.65% WER), plus 7 other languages. Medium Streaming beats Whisper Large v3 (7.44%) at ~1/6 the params; sub-200ms latency; native streaming with input-encoding caching.
- Runtime: ONNX Runtime (C++ core) — not MLX or CoreML. Ships an official Swift package (
moonshine-ai/moonshine-swift, SPM) with an example Transcriber iOS app. A Swift API also exists in sherpa-onnx.
Why it fits
- The spine was built for this. Same drop-in shape as Parakeet (T2.5):
case moonshine in TranscriptionEngine, a MoonshineModelStore + store(for:) arm in ModelStores, a TranscriberFactory arm, an options case, a settings section.
- Local-first, on-device, not-Apple-only → textbook match for the "on-device ≠ Apple-only" axis (MIT, fully offline, third-party model).
- Closes the one real gap: live partials. Apple Speech is currently the only engine with live partials — Whisper and Parakeet are both finalize-only ("placeholder UX while recording",
planning/notes.md). Moonshine's streaming models are purpose-built to feed our existing TranscriptionSession streaming protocol with incremental text → it would be the first non-Apple engine showing words as you speak, which is the core tap→speak→see-it UX.
- Tiny footprint. Models are 26–245 MB vs Parakeet's 2.4 GB / Whisper's 481 MB — a big win on download size/memory; sidesteps the 8 GB jetsam concerns.
Costs / risks
- A second inference runtime. Existing engines are raw
mlx-swift ports (Parakeet = 12 hand-ported files). Moonshine via ONNX Runtime adds a new dependency stack; the ORT iOS static lib inflates the binary (models are small, the runtime isn't). Dilutes the "MLX primary for ASR" stance slightly. Counterpoint: using moonshine-swift/sherpa-onnx avoids a Parakeet-style hand-port — likely less integration code, just a new binary dependency.
- Maturity.
moonshine-swift is official and actively released (v0.0.62, June 2026; 46 releases) but young/niche (~7 stars, ~57 commits). Treat like the LocalLLMClient caveat.
- Outside the MLX eviction logic. The "single live MLX engine" eviction in
TranscriberFactory assumes MLX; Moonshine sits outside it (like Apple). Low risk given small models, but needs a co-residency check.
- CPU/ORT path, not GPU/ANE by default (optionally CoreML EP). Different perf profile than the Metal-backed MLX engines; sub-200ms claims are CPU-measured.
Where it slots
Not above Parakeet on raw accuracy (Parakeet 0.6b stays the ceiling). Moonshine adds a new axis: low-latency, small-footprint, live-streaming third-party ASR. Natural fit as a T2.x "streaming on-device" engine. Also a ready-made AlteredCraft post ("MLX vs ONNX Runtime on iPhone"; "the first non-Apple engine with true live partials").
Verify before committing
From engine-landscape evaluation, 2026-06-14.
Finding (2026-06-14)
Evaluated moonshine-ai/moonshine as a candidate on-device engine. Verdict: a good candidate — but for live partials + tiny footprint, not as another accuracy rung. Worth adding behind the existing spine; the cost is a second inference runtime.
What it is
moonshine-ai/moonshine-swift, SPM) with an example Transcriber iOS app. A Swift API also exists in sherpa-onnx.Why it fits
case moonshineinTranscriptionEngine, aMoonshineModelStore+store(for:)arm inModelStores, aTranscriberFactoryarm, an options case, a settings section.planning/notes.md). Moonshine's streaming models are purpose-built to feed our existingTranscriptionSessionstreaming protocol with incremental text → it would be the first non-Apple engine showing words as you speak, which is the core tap→speak→see-it UX.Costs / risks
mlx-swiftports (Parakeet = 12 hand-ported files). Moonshine via ONNX Runtime adds a new dependency stack; the ORT iOS static lib inflates the binary (models are small, the runtime isn't). Dilutes the "MLX primary for ASR" stance slightly. Counterpoint: usingmoonshine-swift/sherpa-onnx avoids a Parakeet-style hand-port — likely less integration code, just a new binary dependency.moonshine-swiftis official and actively released (v0.0.62, June 2026; 46 releases) but young/niche (~7 stars, ~57 commits). Treat like theLocalLLMClientcaveat.TranscriberFactoryassumes MLX; Moonshine sits outside it (like Apple). Low risk given small models, but needs a co-residency check.Where it slots
Not above Parakeet on raw accuracy (Parakeet 0.6b stays the ceiling). Moonshine adds a new axis: low-latency, small-footprint, live-streaming third-party ASR. Natural fit as a T2.x "streaming on-device" engine. Also a ready-made AlteredCraft post ("MLX vs ONNX Runtime on iPhone"; "the first non-Apple engine with true live partials").
Verify before committing
moonshine-swiftstreaming API actually emits incremental partials we can pipe intoTranscriptionSession..app.TranscriberFactory/eviction without co-residency surprises.From engine-landscape evaluation, 2026-06-14.