AlteredCraft · samkeen · Jun 14, 2026 · Jun 14, 2026 · Jun 15, 2026 · Jun 15, 2026
diff --git a/CHANGE_LOG.md b/CHANGE_LOG.md
@@ -140,3 +140,8 @@ Entry style: bold lead-in summarizing what shipped, then the *why* / non-obvious
 - **L2.0 device-validated (iPhone 15 Pro Max) — on-device LLM cleanup works; the thesis lands on the first real run.** Two findings. (1) **The QAT build doesn't load.** `mlx-community/gemma-4-E2B-it-qat-4bit` failed with `keyNotFound(language_model.model.layers.15.self_attn.k_proj.weight)`: Gemma 4 uses **KV-cache sharing** (layers 15–34 of 35 reuse earlier layers' K/V, so the checkpoint omits their `k_proj`/`v_proj`), but MLXLLM 3.31.3's `Gemma4Attention` declares a `kProj` Linear for *every* layer (runtime handles sharing via a `sharedKV` path; load-time doesn't), so weight-mapping throws at layer 15. Verified via the two repos' `model.safetensors.index.json`: the QAT build has `k_proj` on layers 0–14 only; the **non-QAT `mlx-community/gemma-4-e2b-it-4bit`** (the library's registered `LLMRegistry.gemma4_e2b_it_4bit` preset) materializes `k_proj` on all 35 layers. **Switched the smoke primary QAT → non-QAT** (one line) — the plan's "prefer `-qat-4bit`" is overridden by tooling (`plan.L2.md` §4). The download + bridge were never at fault (the QAT run downloaded 4.3 GB and mapped weights cleanly through layer 14). (2) **It runs, and runs well.** Non-QAT Gemma 4 E2B loaded in **3.4 s** (resident floor **2.67 GB**) and cleaned the inline sample to a genuinely good result — fillers (`um`/`uh`) removed, doubled words fixed (`the the`→`the`, `is is`→`is`), punctuation + sentence breaks added, `on boarding`→`onboarding`, `theres`→`there's`, **all content preserved, nothing invented** — an early but strong signal for the L2 bet (cleanup mitigates STT mess). Throughput **~23 tok/s** (approx, word-count-derived; precise streamed tok/s deferred to L2.3). **Memory:** generation peak `phys_footprint` **3.02 GB** — *just* under the ~3 GB no-entitlement jetsam ceiling **on a tiny ~50-word output**; a real multi-minute note's larger KV cache will likely cross it, so the entitlement stays justified (re-measure on a long fixture at L2.2/L2.3). **Entitlement resolved (§10 Q3):** the **free Apple ID tier accepts `increased-memory-limit`** — the device build signed, installed, and ran with it; no V1.4 trigger. L2.0 (+ most of L2.1) done. Next: **L2.2** — capture real Apple-Speech transcripts into `cleanup_fixtures.json`, then **L2.3** the head-to-head verdict.
 - **L2 re-sequenced — in-app "Clean up" MVP pulled forward ahead of the fixtures/harness route (code-complete; device-pending).** Since L2.0 proved on-device cleanup works with strong quality, the "harness-first, UI-last" caution was retired and the real feature became the evaluation vehicle (dogfooding real notes beats curated fixtures and yields the transcripts for free). **L2.2 fixtures superseded; L2.3 model A/B deferred** (the `LLMCleanupSmoke` repo-id path still sweeps candidates when wanted). The feature, mirroring the re-transcribe workflow and decoupled from MLX behind the `LanguageModel` protocol: **centralized model management in the Tuning sheet** — `CleanupModelStore` (a `DownloadableModelStore` bound to the new `ModelDownloadSpec.gemmaCleanupE2B`, pinned to commit `2c3e5074…`: the complete `gemma-4-e2b-it-4bit` snapshot — `model.safetensors` 3.58 GB + tokenizer/config/chat-template, 8 files, each SHA-256 + size verified) into `Application Support/llm/gemma-4-e2b-it-4bit/`, registered as a sibling on `ModelStores` (outside the transcription-engine `store(for:)`/`readyEngines` machinery), surfaced via `CleanupModelSection` (download/progress/delete, ~3.4 GB copy, same shape as `WhisperModelSection`/`ParakeetModelSection`). The app loads from the downloaded directory via `LLMModelFactory.loadContainer(from:using:)` — so **`MLXLanguageModel` gained a `Source` enum** (`.directory` for the app, `.repoId` for the DEBUG smoke; the smoke keeps the HubClient path, so `swift-huggingface` stays used and no dep was removed). **`Cleaner`** (`@MainActor @Observable`, the cleanup analogue of `ReTranscriber`) gates on `store.status == .ready`, runs `clean()` off a directory-loaded model, returns a non-destructive `Outcome`, and `evict()`s the ~2.7 GB model when the note is left. **`Note`** gained additive `cleanedTranscript`/`cleanupModel` (+ `isCleaned`/`applyCleanup`/`clearCleanup`) — raw `transcript` is never overwritten. **`NoteDetailView`**: a "Clean up" control (→ "Set up cleanup model" deep-link to the Tuning sheet when the model's absent, via an `onOpenSettings` closure threaded `ContentView`→`NotesListView`→detail), a before/after **Accept/Decline** sheet (`CleanupOutcomeSheet`), and a cleaned/raw display toggle + "Cleaned with …" provenance + "Remove". Tests (sim-safe): `CleanerTests` (gating + generic error message), `NoteTests` cleanup helpers + persistence, `DownloadableModelStoreTests` gemma spec pinning + `CleanupModelStore` readiness/subdirectory. Build + full simulator suite (20 suites) green. Plan: `plan.L2.md` §1 (re-sequencing note) + §6.
 - **L2.4 device-validated (iPhone 15 Pro Max) — in-app cleanup works end-to-end.** Downloaded the cleanup model via the Tuning `CleanupModelSection` (the SHA-pinned `CleanupModelStore` path → `Application Support/llm/gemma-4-e2b-it-4bit/`, separate from the smoke's HubClient cache), then ran "Clean up" on a real note: the before/after sheet showed the cleaned candidate, Accept persisted `cleanedTranscript`/`cleanupModel` with the raw `transcript` preserved. Confirms the directory-load path (`loadContainer(from:using:)`) and the whole gated flow on device. **L2 has shipped a usable on-device cleanup feature** — the L2 thesis (cleanup mitigates STT errors) is now dogfoodable on real notes; remaining L2 polish (precise tok/s; optional model picker / formal A/B via the deferred L2.3 harness) is non-blocking.
+- **Onboarding doc — `docs/relay-notes-ios-guide.html`.** A self-contained, dependency-free HTML guide aimed at a seasoned programmer who is new to iOS/Swift: a "Rosetta Stone" mapping familiar concepts (interface→`protocol`, sum type→`enum` w/ associated values, ORM→SwiftData, UI-thread→`@MainActor`, Stream/Observable→`AsyncStream`, DI scope→`@Environment`) onto this codebase, then a guided tour of the provider-abstraction spine (`Transcriber`/`TranscriptionSession`/`TranscriptionEngine`/`TranscriberFactory` + the `TranscriptionOptions` sum type) and an end-to-end `tap → speak → saved` data-flow walkthrough traced through the real types (`RecorderViewModel` state machine, `LiveAudioEngine` double-duty tap, the three `AsyncStream`s, SwiftData `Note` save). Covers the Swift-6-strict-concurrency story (actors, `@MainActor`-by-default, the `nonisolated protocol` trap, `@unchecked Sendable` on `TapState`) and the iOS realities (permissions, `AVAudioSession`, background-audio `Info.plist`, simulator-can't-run-MLX, 7-day free-tier signing). **Four hand-authored inline-SVG architecture diagrams** orient the reader: a 5-layer system map, a concurrency isolation-domains map, a runtime swimlane for tap→speak→saved, and the recorder state-machine. Self-contained (no JS libs): a vanilla-JS Swift/bash highlighter + sidebar scroll-spy; teal→indigo app-icon palette. Reference/onboarding artifact, not a code change — no app behavior touched.
+
+## 2026-06-15
+
+- **Onboarding guide synced to T2 + L2 (`docs/relay-notes-ios-guide.html`).** Merged `main` into the doc branch and brought the guide current with the two features that landed since it was written: **Parakeet (third on-device transcription engine)** and **L2 on-device LLM cleanup (the `LanguageModel` spine)**. Substantive edits: framed the provider abstraction as **used twice** (transcription + cleanup) rather than "reserved for a future stage"; the tour now shows **three engines** (Apple Speech as the *permanent* default, Whisper + Parakeet opt-in) plus an optional "Clean up" pipeline step; §06 gained a **"The spine, proven twice: `LanguageModel`"** subsection (protocol + `MLXLanguageModel` actor + `Cleaner` + non-destructive `Note.cleanedTranscript`), and the factory snippet now shows the **single-live-MLX-engine eviction** + the `ModelStores` registry; §09 became a **three-engine** comparison and notes the cleanup LLM as a fourth MLX actor; §08 documents the additive/non-destructive cleanup fields; §10 fixed the now-stale signing note and added a card on the **`increased-memory-limit` entitlement** (accepted on the free tier); §11 updated to ~150 tests + the new `mlx-swift-lm` / `swift-huggingface` / `swift-transformers` deps. **Redrew the layered-map diagram** (6 columns, the two protocol spines in a teal/indigo seam, Parakeet/`MLXLanguageModel`/`ModelStores`/`Cleaner` added) and updated the isolation-domains actor box. Diagrams re-rendered + eyeballed (cairosvg). Docs-only; no app behavior touched.