Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions CHANGE_LOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,3 +140,8 @@ Entry style: bold lead-in summarizing what shipped, then the *why* / non-obvious
- **L2.0 device-validated (iPhone 15 Pro Max) — on-device LLM cleanup works; the thesis lands on the first real run.** Two findings. (1) **The QAT build doesn't load.** `mlx-community/gemma-4-E2B-it-qat-4bit` failed with `keyNotFound(language_model.model.layers.15.self_attn.k_proj.weight)`: Gemma 4 uses **KV-cache sharing** (layers 15–34 of 35 reuse earlier layers' K/V, so the checkpoint omits their `k_proj`/`v_proj`), but MLXLLM 3.31.3's `Gemma4Attention` declares a `kProj` Linear for *every* layer (runtime handles sharing via a `sharedKV` path; load-time doesn't), so weight-mapping throws at layer 15. Verified via the two repos' `model.safetensors.index.json`: the QAT build has `k_proj` on layers 0–14 only; the **non-QAT `mlx-community/gemma-4-e2b-it-4bit`** (the library's registered `LLMRegistry.gemma4_e2b_it_4bit` preset) materializes `k_proj` on all 35 layers. **Switched the smoke primary QAT → non-QAT** (one line) — the plan's "prefer `-qat-4bit`" is overridden by tooling (`plan.L2.md` §4). The download + bridge were never at fault (the QAT run downloaded 4.3 GB and mapped weights cleanly through layer 14). (2) **It runs, and runs well.** Non-QAT Gemma 4 E2B loaded in **3.4 s** (resident floor **2.67 GB**) and cleaned the inline sample to a genuinely good result — fillers (`um`/`uh`) removed, doubled words fixed (`the the`→`the`, `is is`→`is`), punctuation + sentence breaks added, `on boarding`→`onboarding`, `theres`→`there's`, **all content preserved, nothing invented** — an early but strong signal for the L2 bet (cleanup mitigates STT mess). Throughput **~23 tok/s** (approx, word-count-derived; precise streamed tok/s deferred to L2.3). **Memory:** generation peak `phys_footprint` **3.02 GB** — *just* under the ~3 GB no-entitlement jetsam ceiling **on a tiny ~50-word output**; a real multi-minute note's larger KV cache will likely cross it, so the entitlement stays justified (re-measure on a long fixture at L2.2/L2.3). **Entitlement resolved (§10 Q3):** the **free Apple ID tier accepts `increased-memory-limit`** — the device build signed, installed, and ran with it; no V1.4 trigger. L2.0 (+ most of L2.1) done. Next: **L2.2** — capture real Apple-Speech transcripts into `cleanup_fixtures.json`, then **L2.3** the head-to-head verdict.
- **L2 re-sequenced — in-app "Clean up" MVP pulled forward ahead of the fixtures/harness route (code-complete; device-pending).** Since L2.0 proved on-device cleanup works with strong quality, the "harness-first, UI-last" caution was retired and the real feature became the evaluation vehicle (dogfooding real notes beats curated fixtures and yields the transcripts for free). **L2.2 fixtures superseded; L2.3 model A/B deferred** (the `LLMCleanupSmoke` repo-id path still sweeps candidates when wanted). The feature, mirroring the re-transcribe workflow and decoupled from MLX behind the `LanguageModel` protocol: **centralized model management in the Tuning sheet** — `CleanupModelStore` (a `DownloadableModelStore` bound to the new `ModelDownloadSpec.gemmaCleanupE2B`, pinned to commit `2c3e5074…`: the complete `gemma-4-e2b-it-4bit` snapshot — `model.safetensors` 3.58 GB + tokenizer/config/chat-template, 8 files, each SHA-256 + size verified) into `Application Support/llm/gemma-4-e2b-it-4bit/`, registered as a sibling on `ModelStores` (outside the transcription-engine `store(for:)`/`readyEngines` machinery), surfaced via `CleanupModelSection` (download/progress/delete, ~3.4 GB copy, same shape as `WhisperModelSection`/`ParakeetModelSection`). The app loads from the downloaded directory via `LLMModelFactory.loadContainer(from:using:)` — so **`MLXLanguageModel` gained a `Source` enum** (`.directory` for the app, `.repoId` for the DEBUG smoke; the smoke keeps the HubClient path, so `swift-huggingface` stays used and no dep was removed). **`Cleaner`** (`@MainActor @Observable`, the cleanup analogue of `ReTranscriber`) gates on `store.status == .ready`, runs `clean()` off a directory-loaded model, returns a non-destructive `Outcome`, and `evict()`s the ~2.7 GB model when the note is left. **`Note`** gained additive `cleanedTranscript`/`cleanupModel` (+ `isCleaned`/`applyCleanup`/`clearCleanup`) — raw `transcript` is never overwritten. **`NoteDetailView`**: a "Clean up" control (→ "Set up cleanup model" deep-link to the Tuning sheet when the model's absent, via an `onOpenSettings` closure threaded `ContentView`→`NotesListView`→detail), a before/after **Accept/Decline** sheet (`CleanupOutcomeSheet`), and a cleaned/raw display toggle + "Cleaned with …" provenance + "Remove". Tests (sim-safe): `CleanerTests` (gating + generic error message), `NoteTests` cleanup helpers + persistence, `DownloadableModelStoreTests` gemma spec pinning + `CleanupModelStore` readiness/subdirectory. Build + full simulator suite (20 suites) green. Plan: `plan.L2.md` §1 (re-sequencing note) + §6.
- **L2.4 device-validated (iPhone 15 Pro Max) — in-app cleanup works end-to-end.** Downloaded the cleanup model via the Tuning `CleanupModelSection` (the SHA-pinned `CleanupModelStore` path → `Application Support/llm/gemma-4-e2b-it-4bit/`, separate from the smoke's HubClient cache), then ran "Clean up" on a real note: the before/after sheet showed the cleaned candidate, Accept persisted `cleanedTranscript`/`cleanupModel` with the raw `transcript` preserved. Confirms the directory-load path (`loadContainer(from:using:)`) and the whole gated flow on device. **L2 has shipped a usable on-device cleanup feature** — the L2 thesis (cleanup mitigates STT errors) is now dogfoodable on real notes; remaining L2 polish (precise tok/s; optional model picker / formal A/B via the deferred L2.3 harness) is non-blocking.
- **Onboarding doc — `docs/relay-notes-ios-guide.html`.** A self-contained, dependency-free HTML guide aimed at a seasoned programmer who is new to iOS/Swift: a "Rosetta Stone" mapping familiar concepts (interface→`protocol`, sum type→`enum` w/ associated values, ORM→SwiftData, UI-thread→`@MainActor`, Stream/Observable→`AsyncStream`, DI scope→`@Environment`) onto this codebase, then a guided tour of the provider-abstraction spine (`Transcriber`/`TranscriptionSession`/`TranscriptionEngine`/`TranscriberFactory` + the `TranscriptionOptions` sum type) and an end-to-end `tap → speak → saved` data-flow walkthrough traced through the real types (`RecorderViewModel` state machine, `LiveAudioEngine` double-duty tap, the three `AsyncStream`s, SwiftData `Note` save). Covers the Swift-6-strict-concurrency story (actors, `@MainActor`-by-default, the `nonisolated protocol` trap, `@unchecked Sendable` on `TapState`) and the iOS realities (permissions, `AVAudioSession`, background-audio `Info.plist`, simulator-can't-run-MLX, 7-day free-tier signing). **Four hand-authored inline-SVG architecture diagrams** orient the reader: a 5-layer system map, a concurrency isolation-domains map, a runtime swimlane for tap→speak→saved, and the recorder state-machine. Self-contained (no JS libs): a vanilla-JS Swift/bash highlighter + sidebar scroll-spy; teal→indigo app-icon palette. Reference/onboarding artifact, not a code change — no app behavior touched.

## 2026-06-15

- **Onboarding guide synced to T2 + L2 (`docs/relay-notes-ios-guide.html`).** Merged `main` into the doc branch and brought the guide current with the two features that landed since it was written: **Parakeet (third on-device transcription engine)** and **L2 on-device LLM cleanup (the `LanguageModel` spine)**. Substantive edits: framed the provider abstraction as **used twice** (transcription + cleanup) rather than "reserved for a future stage"; the tour now shows **three engines** (Apple Speech as the *permanent* default, Whisper + Parakeet opt-in) plus an optional "Clean up" pipeline step; §06 gained a **"The spine, proven twice: `LanguageModel`"** subsection (protocol + `MLXLanguageModel` actor + `Cleaner` + non-destructive `Note.cleanedTranscript`), and the factory snippet now shows the **single-live-MLX-engine eviction** + the `ModelStores` registry; §09 became a **three-engine** comparison and notes the cleanup LLM as a fourth MLX actor; §08 documents the additive/non-destructive cleanup fields; §10 fixed the now-stale signing note and added a card on the **`increased-memory-limit` entitlement** (accepted on the free tier); §11 updated to ~150 tests + the new `mlx-swift-lm` / `swift-huggingface` / `swift-transformers` deps. **Redrew the layered-map diagram** (6 columns, the two protocol spines in a teal/indigo seam, Parakeet/`MLXLanguageModel`/`ModelStores`/`Cleaner` added) and updated the isolation-domains actor box. Diagrams re-rendered + eyeballed (cairosvg). Docs-only; no app behavior touched.
Loading