Evaluate Moonshine as a 4th on-device engine (streaming/live-partials play, not an accuracy rung)

## Finding (2026-06-14)

Evaluated [moonshine-ai/moonshine](https://github.com/moonshine-ai/moonshine) as a candidate on-device engine. **Verdict: a good candidate — but for live partials + tiny footprint, not as another accuracy rung.** Worth adding behind the existing spine; the cost is a second inference runtime.

## What it is

- **Streaming-first ASR family**, MIT-licensed, designed for edge from day one. English Tiny (26M, 12.66% WER) → Base (58M) → Small Streaming (123M) → **Medium Streaming (245M, 6.65% WER)**, plus 7 other languages. Medium Streaming beats Whisper Large v3 (7.44%) at ~1/6 the params; sub-200ms latency; native streaming with input-encoding caching.
- **Runtime: ONNX Runtime (C++ core)** — *not* MLX or CoreML. Ships an official Swift package ([`moonshine-ai/moonshine-swift`](https://github.com/moonshine-ai/moonshine-swift), SPM) with an example **Transcriber** iOS app. A Swift API also exists in sherpa-onnx.

## Why it fits

1. **The spine was built for this.** Same drop-in shape as Parakeet (T2.5): `case moonshine` in `TranscriptionEngine`, a `MoonshineModelStore` + `store(for:)` arm in `ModelStores`, a `TranscriberFactory` arm, an options case, a settings section.
2. **Local-first, on-device, not-Apple-only** → textbook match for the "on-device ≠ Apple-only" axis (MIT, fully offline, third-party model).
3. **Closes the one real gap: live partials.** Apple Speech is currently the *only* engine with live partials — Whisper and Parakeet are both finalize-only ("placeholder UX while recording", `planning/notes.md`). Moonshine's streaming models are purpose-built to feed our existing `TranscriptionSession` streaming protocol with incremental text → it would be the **first non-Apple engine showing words as you speak**, which is the core tap→speak→see-it UX.
4. **Tiny footprint.** Models are 26–245 MB vs Parakeet's 2.4 GB / Whisper's 481 MB — a big win on download size/memory; sidesteps the 8 GB jetsam concerns.

## Costs / risks

1. **A second inference runtime.** Existing engines are raw `mlx-swift` ports (Parakeet = 12 hand-ported files). Moonshine via ONNX Runtime adds a new dependency stack; the ORT iOS static lib inflates the binary (models are small, the runtime isn't). Dilutes the "MLX primary for ASR" stance slightly. *Counterpoint:* using `moonshine-swift`/sherpa-onnx avoids a Parakeet-style hand-port — likely **less** integration code, just a new binary dependency.
2. **Maturity.** `moonshine-swift` is official and actively released (v0.0.62, June 2026; 46 releases) but young/niche (~7 stars, ~57 commits). Treat like the `LocalLLMClient` caveat.
3. **Outside the MLX eviction logic.** The "single live MLX engine" eviction in `TranscriberFactory` assumes MLX; Moonshine sits outside it (like Apple). Low risk given small models, but needs a co-residency check.
4. **CPU/ORT path, not GPU/ANE by default** (optionally CoreML EP). Different perf profile than the Metal-backed MLX engines; sub-200ms claims are CPU-measured.

## Where it slots

Not above Parakeet on raw accuracy (Parakeet 0.6b stays the ceiling). Moonshine adds a **new axis**: low-latency, small-footprint, *live-streaming* third-party ASR. Natural fit as a **T2.x "streaming on-device" engine**. Also a ready-made AlteredCraft post ("MLX vs ONNX Runtime on iPhone"; "the first non-Apple engine with true live partials").

## Verify before committing

- [ ] `moonshine-swift` streaming API actually emits incremental partials we can pipe into `TranscriptionSession`.
- [ ] ORT iOS binary-size hit on the built `.app`.
- [ ] Works behind the existing `TranscriberFactory`/eviction without co-residency surprises.

---
_From engine-landscape evaluation, 2026-06-14._

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate Moonshine as a 4th on-device engine (streaming/live-partials play, not an accuracy rung) #9

Finding (2026-06-14)

What it is

Why it fits

Costs / risks

Where it slots

Verify before committing

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Evaluate Moonshine as a 4th on-device engine (streaming/live-partials play, not an accuracy rung) #9

Description

Finding (2026-06-14)

What it is

Why it fits

Costs / risks

Where it slots

Verify before committing

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions