diff --git a/CHANGE_LOG.md b/CHANGE_LOG.md index 46672f3..06ad452 100644 --- a/CHANGE_LOG.md +++ b/CHANGE_LOG.md @@ -165,6 +165,11 @@ Entry style: bold lead-in summarizing what shipped, then the *why* / non-obvious - **Swift best-practices pass, round 2 — view/config dedup (open-source readiness).** Took on the lower-risk "larger refactors" from `planning/swift-best-practices-review.md`. (1) **Model-download sections deduped** — `WhisperModelSection`/`ParakeetModelSection`/`CleanupModelSection` were ~90% identical (status switch, download/cancel/delete, confirm-alert); extracted `Views/ModelDownloadSection.swift` typed on the `@Observable` `DownloadableModelStore` base, leaving the three as thin wrappers that supply only copy + `onDeleted`. `WhisperModelSection.failureMessage` kept (test-referenced); `SettingsView` call sites unchanged; auto-included via the synchronized group (no pbxproj edit). View composition only — provider abstraction untouched. (2) **`SettingsView.engineRow(_:title:subtitle:isEnabled:)`** extracted from three byte-identical engine `Button` rows. (3) **Parakeet config chain `Codable` → `Decodable`** + deleted the dead `ParakeetDecodingConfig.encode(to:)` (configs are decode-only — no `JSONEncoder` in the tree; the custom encoder only existed because `maxSymbols` has no `CodingKeys` case, and the whole chain moved since the parent's synthesized `Encodable` required the child's). **Deferred** (do with a compiler / `ParakeetSmoke` in the loop): the `ParakeetDecoder` per-step `fatalError` → stored-`Linear` fix (MLXNN parameter-tree keying is load-bearing) and the Whisper KV-cache `typealias` (cosmetic, high-churn, type-label risk). **Declined:** `presetLabel`/`presetID` → `switch` — `SpeechTranscriber.Preset` is an Equatable struct, not a closed enum. **Not compiled here** — needs an Xcode build + test run. - **Swift best-practices pass, round 3 — lower-priority polish (open-source readiness), build- and test-verified.** First of the best-practices passes actually compiled + tested locally (`xcodebuild build` + the 61-test suite, both green on the iPhone 17 Pro simulator). No behavior change on any path. Items, keyed to `planning/swift-best-practices-review.md`: (C) **"GB mismatch" was a false alarm** — the cleanup-model spec is authoritatively `downloadSizeMB: 3446` (≈3.4 GB on disk, matching the UI copy and the summed `RemoteFile.size`s); the two stray "~2.7 GB" comments (`Cleaner.evict`, `NoteDetailView.onDisappear`) describe *resident memory freed*, a genuinely different quantity (Gemma E2B's per-layer-embedding/MatFormer design makes RAM < disk), so they were clarified as "resident memory" rather than forced to match the download. (D) **Magic numbers named** — `ParakeetAudio`'s two same-valued `1e-5`s split into `logMelFloor` (keeps `log()` finite on silent bins) vs `normEpsilon` (divide-by-zero guard in z-scoring); plus `AudioPlayer`'s `endSnapThreshold` (0.05 s end-snap), `LiveAudioEngine`'s `resamplerHeadroom` (1024-frame converter slack), `Note`'s `tieBreakNudge` (1 ms same-instant ordering tie-break). (E) **`NotesListView.deleteNotes(at:)`** gained an invariant comment: the `IndexSet` offsets index into `filteredNotes` *because* that's the `ForEach` source — keep them the same array or swipe-delete hits the wrong rows. (J) **`DownloadCoordinator`'s `unexpectedHTTPStatus(-1)` sentinel** (the "can't happen, download succeeded with no file" guard) replaced by a distinct `CoordinatorError.missingDownloadResult` so a fake `-1` never reads as a real HTTP status in logs; it falls through to the generic network-failure arm. **Deferred** (held for a focused build/test pass): item B (KV-cache `typealias`), F (`stopAndTranscribe` teardown dedup — note the doc's "cancel up-front" can't nil `session` before `session.finish()`; capture a local), G (`ModelStores` init seam). **Left alone:** A (`ParakeetDecoder` stored `Linear` — MLXNN param-key risk), H (error-type naming churn), I (formatter consolidation — stale framing: `RecorderView` already delegates to `RecorderViewModel.formatElapsed`). - **Swift best-practices, item B — KV-cache `typealias` (build + test verified).** Named the Whisper decoder's cryptic cache tuples: `WhisperKV = (MLXArray, MLXArray)` (one attention's keys/values) and `WhisperLayerKVCache = (WhisperKV?, WhisperKV?)` (a decoder block's self- + cross-attention cache), declared at the top of `Transcription/WhisperModel.swift` and substituted at all 11 sites across `WhisperModel.swift` + `WhisperDecoding.swift` (the `ResidualAttentionBlock`/`TextDecoder`/`WhisperAttention` signatures, the `[WhisperLayerKVCache]` per-block arrays, and `WhisperDecoding`'s decode loop). Replaces the repeated `((MLXArray, MLXArray)?, (MLXArray, MLXArray)?)` so element chains like `kvCache[0].0` (block 0's self-attn KV) read against a named type. Pure compile-time substitution — identical underlying types, zero runtime change — so the simulator build + 61-test suite fully verify it without a device run. This was the last of the "deferred — needs a compiler in the loop" items worth taking; A (`ParakeetDecoder` stored `Linear`) stays deferred on the MLXNN param-key risk. +- **Swift best-practices, round 4 — F, G, docstrings (build + test verified).** Closes the last simulator-validatable items from `planning/swift-best-practices-review.md`; `xcodebuild build` + the 61-test suite green on the iPhone 17 Pro simulator, no behavior change on any path. (G) **`ModelStores`' six `init` overloads → one `nil`-defaulted init** (`whisper:parakeet:cleanup:`, each slot `?? Real…Store()`). The `nil` defaults sidestep the original blocker (a non-`nil` default arg is evaluated in a nonisolated context and can't call the `@MainActor` store init) by deferring construction to the `??` fallbacks in the `@MainActor` init body; every call site (production `ModelStores()` + the labeled test combos in `ModelStoresTests`/`ReTranscriberTests`/`WhisperMLXTranscriberTests`) is unchanged. (F) **`RecorderViewModel.stopAndTranscribe()` teardown dedup** — the two identical light teardowns (guard-else + success) collapse into one up-front block: cancel `updatesTask`, capture `let session = self.session`, nil the stored `session`/`currentAudioURL`, then `finish()`/`cancel()` on the *local*. State is already `.finalizing`, so cancelling the partials loop early is a no-op; the catch paths now `await session.cancel()` on the local (the only `cleanupAfterFailure()` work not already done up-front), preserving behavior without the heavier helper. (Docstrings) **First docstring-coverage pass** (CodeRabbit CHILL gate) — backfilled `///` on the transcriber entry-point spine + recording view model: `ModelStores`, `RecorderViewModel` (state machine + public methods + deps), `TranscriberFactory.transcriber(for:)`, the `Transcriber`/`TranscriptionSession` protocols (incl. the load-bearing two-method `Transcriber` rationale), and `AppleSpeechTranscriber` (was 0% — the default engine). Targeted, not metric-chased: the gate re-measures each PR's changed set, so the durable fix is the standing "every new public/internal decl ships a doc comment" rule, not filler on private storage / obvious `init`s. Only item **A** (`ParakeetDecoder` stored `Linear`) now remains, deferred on the device-only `ParakeetSmoke` validation. + +## 2026-06-20 + +- **Swift best-practices, item A — `ParakeetDecoder` per-step `fatalError` removed (device-validated). Closes the review backlog.** The joint network's `callAsFunction` cast `joint_net[2] as? Linear` on every decode step and `fatalError`'d on miss. The review's original sketch — resolve the final `Linear` into a stored property — was **rejected**: a second `Linear`-typed property aliases `joint_net[2]`, adding a duplicate parameter key (`joint.finalLinear.*`) that breaks `update(verify: .all)` / save round-trips, trading a crash path for a latent loading bug. **Applied instead:** retype `let joint_net: [Module]` → `[UnaryLayer]` (all three slots — `ReLU`, `ParakeetIdentity`, `Linear` — already conform to `UnaryLayer`), so the joint applies the trailing Linear as `joint_net[2](activated)` directly — no cast, no `fatalError`, no new stored property. **Why the `joint_net` snake_case key survives the retype** (the load-bearing risk the review flagged): MLXNN keys array children by *runtime value*, not static element type — `MLXNN.ModuleValue.build(value:)` casts the array to `[Any]` and inspects each element `as Module`, so `joint.joint_net.0/1/2` is derived identically for `[Module]` and `[UnaryLayer]` (verified by reading mlx-swift 0.31.4 `Source/MLXNN/Module.swift`). Since it's the same `Linear` with the same weights applied the same way, output is numerically identical to the pre-change path. **Validated on the iPhone 15 Pro Max** (CLI device build → `devicectl install` → `ParakeetSmoke.run()` via the `#if DEBUG` Settings button, captured over `idevicesyslog`): full model loaded under `verify: .noUnusedKeys` (which *throws* if `joint.joint_net.2.*` goes unconsumed — it didn't), T2.1d decode `substring check ("openly shouldered the burden") = PASS ✅` (282 ms), T2.1e `chunking check = PASS ✅`, peak process footprint 1345.5 MB (unchanged, still fits without `increased-memory-limit`). Simulator build + 61-test suite also green. With this the `planning/swift-best-practices-review.md` backlog is empty — every actionable item applied; only cosmetic H/I and the declined `preset → switch` were intentionally skipped. ## 2026-06-17 diff --git a/Relay Notes/Recording/RecorderViewModel.swift b/Relay Notes/Recording/RecorderViewModel.swift index 92db59d..175ed53 100644 --- a/Relay Notes/Recording/RecorderViewModel.swift +++ b/Relay Notes/Recording/RecorderViewModel.swift @@ -3,9 +3,17 @@ import Foundation import Observation import SwiftData +/// Drives the tap-to-record flow: owns the capture engine + streaming +/// transcription session, advances the recording state machine, and persists the +/// finished `Note`. Constructed lazily in `ContentView.task` and read by +/// `RecorderView`. @MainActor @Observable final class RecorderViewModel { + /// The recording state machine. `partial` carries the latest streamed + /// transcript through `.recording`/`.paused` (empty for non-streaming + /// engines); `.paused` is driven by `AVAudioSession` interruptions, not the + /// user. See the "Tap-to-record state machine" note in CLAUDE.md. enum State: Equatable { case idle case recording(partial: String) @@ -15,6 +23,7 @@ final class RecorderViewModel { case failed(message: String) } + /// Current point in the recording flow; the single source of truth `RecorderView` renders. private(set) var state: State = .idle /// Whether the active recording's engine streams live partials. Captured @@ -34,16 +43,24 @@ final class RecorderViewModel { /// during interruption pauses). Shown in the Whisper placeholder. private(set) var elapsed: Duration = .zero + /// Live recording/transcription dials, also surfaced to `SettingsView`. A + /// snapshot is read at record start, so mid-recording edits don't take effect + /// until the next session (see `Tunings`). let tunings: Tunings private let engine: LiveAudioEngine private let transcriberFactory: TranscriberFactory private let modelContext: ModelContext + /// The active streaming session, set at record start and released at finalize. private var session: (any TranscriptionSession)? + /// Forwards captured audio buffers to the session (and drives the meter). private var feedTask: Task? + /// Mirrors the session's partial-transcript stream into `state`. private var updatesTask: Task? + /// Relays `AVAudioSession` interruption events into the state machine. private var interruptionTask: Task? + /// Accumulates `elapsed` for the non-streaming placeholder; nil otherwise. private var elapsedTask: Task? private var currentAudioURL: URL? @@ -59,6 +76,10 @@ final class RecorderViewModel { self.tunings = tunings } + /// Starts a streaming session and audio capture, wiring the feed, partials, + /// interruption, and (for non-streaming engines) elapsed-ticker tasks, then + /// enters `.recording`. Any failure tears everything down and lands in + /// `.failed` with a generic, actionable message (specifics stay in logs). func startRecording() async { do { let transcriber = transcriberFactory.transcriber(for: tunings.engine) @@ -123,6 +144,10 @@ final class RecorderViewModel { } } + /// Stops capture, finalizes the session into a transcript, and persists a + /// `Note` (audio + transcript + model provenance). No-op unless currently + /// `.recording`/`.paused`. On a missing recording or a finalize/empty-speech + /// failure, deletes the orphaned audio and lands in `.failed`. func stopAndTranscribe() async { switch state { case .recording, .paused: break @@ -138,22 +163,26 @@ final class RecorderViewModel { elapsedTask?.cancel() elapsedTask = nil + // Light teardown shared by both the guard-else failure and the success + // paths: cancel the partials loop and release the stored session up-front, + // finishing on a local handle. State is already `.finalizing`, so the + // partials loop is a no-op from here — cancelling it early changes nothing. + // (The catch paths still need this `session` to `cancel()` it, hence the + // local; they don't route through `cleanupAfterFailure()`, whose remaining + // work is already done here.) + updatesTask?.cancel() + updatesTask = nil + let session = self.session + self.session = nil + currentAudioURL = nil + guard let session, let url else { - updatesTask?.cancel() - updatesTask = nil - self.session = nil - currentAudioURL = nil state = .failed(message: "Recording could not be saved. Please try again.") return } do { let transcript = try await session.finish() - updatesTask?.cancel() - updatesTask = nil - self.session = nil - currentAudioURL = nil - let note = Note( audioFilename: url.lastPathComponent, transcript: transcript, @@ -163,16 +192,18 @@ final class RecorderViewModel { try modelContext.save() state = .finished(transcript: transcript) } catch TranscriptionError.noSpeechDetected { - await cleanupAfterFailure() + await session.cancel() try? FileManager.default.removeItem(at: url) state = .failed(message: "We didn't hear any speech. Try again and speak clearly.") } catch { - await cleanupAfterFailure() + await session.cancel() try? FileManager.default.removeItem(at: url) state = .failed(message: "Something went wrong transcribing your note. Please try again.") } } + /// Returns to `.idle` after the user dismisses a `.finished`/`.failed` result, + /// readying the recorder for the next take. func reset() { state = .idle } @@ -192,6 +223,9 @@ final class RecorderViewModel { } } + /// Applies an `AVAudioSession` interruption: pure pause/resume transitions via + /// `nextState`, or — on `.stopped` while paused — auto-finalizes so the + /// captured audio isn't lost (there's no manual-resume affordance). private func handleInterruption(_ event: InterruptionEvent) async { if let next = Self.nextState(for: event, from: state) { state = next @@ -264,6 +298,9 @@ final class RecorderViewModel { return String(format: "%d:%02d", totalSeconds / 60, totalSeconds % 60) } + /// Heavy teardown for the failure paths: cancels every task and `cancel()`s + /// the session (vs. the lighter inline teardown in `stopAndTranscribe()`'s + /// success path, which `finish()`es the session instead). private func cleanupAfterFailure() async { feedTask?.cancel() feedTask = nil diff --git a/Relay Notes/Transcription/AppleSpeechTranscriber.swift b/Relay Notes/Transcription/AppleSpeechTranscriber.swift index f10241f..6f3e352 100644 --- a/Relay Notes/Transcription/AppleSpeechTranscriber.swift +++ b/Relay Notes/Transcription/AppleSpeechTranscriber.swift @@ -2,6 +2,10 @@ import AVFoundation import Foundation import Speech +/// `Transcriber` over Apple's on-device `SpeechAnalyzer`/`SpeechTranscriber` — the +/// app's default engine. `nonisolated` (no MLX, no shared mutable state); both the +/// file-based and streaming paths request authorization, resolve a supported +/// locale, install the on-device model assets, then run the analyzer. nonisolated final class AppleSpeechTranscriber: Transcriber { private let locale: Locale @@ -9,6 +13,8 @@ nonisolated final class AppleSpeechTranscriber: Transcriber { self.locale = locale } + /// One-shot file transcription via `SpeechAnalyzer.analyzeSequence(from:)`, + /// joining recognized segments. Throws `.noSpeechDetected` on an empty result. func transcribe(_ audio: URL, options: TranscriptionOptions) async throws -> String { guard case .apple(let appleOptions) = options else { preconditionFailure("AppleSpeechTranscriber received non-apple options — factory and engine selection are out of sync") @@ -79,6 +85,8 @@ nonisolated final class AppleSpeechTranscriber: Transcriber { return transcript } + /// Opens a live session: starts the analyzer over an `AnalyzerInput` stream and + /// returns an `AppleSpeechSession` that yields volatile + finalized partials. func makeStreamingSession(options: TranscriptionOptions) async throws -> any TranscriptionSession { guard case .apple(let appleOptions) = options else { preconditionFailure("AppleSpeechTranscriber received non-apple options — factory and engine selection are out of sync") @@ -144,6 +152,10 @@ nonisolated final class AppleSpeechTranscriber: Transcriber { } } +/// Live Apple Speech session. A background task drains `transcriber.results`, +/// yielding a running transcript to `updates`: volatile (in-progress) results are +/// appended after the finalized prefix so the live card updates as you speak, and +/// each `isFinal` result is committed into that prefix. private final class AppleSpeechSession: TranscriptionSession { let audioFormat: AVAudioFormat? let updates: AsyncStream @@ -203,6 +215,8 @@ private final class AppleSpeechSession: TranscriptionSession { inputContinuation.yield(AnalyzerInput(buffer: buffer)) } + /// Closes input, finalizes the analyzer through end-of-input, and returns the + /// committed transcript. Throws `.noSpeechDetected` when it's empty. func finish() async throws -> String { inputContinuation.finish() do { @@ -218,6 +232,8 @@ private final class AppleSpeechSession: TranscriptionSession { return transcript } + /// Tears the session down without a result: closes input, cancels the analyzer + /// and the results task, and finishes the `updates` stream. func cancel() async { inputContinuation.finish() await analyzer.cancelAndFinishNow() diff --git a/Relay Notes/Transcription/ModelStores.swift b/Relay Notes/Transcription/ModelStores.swift index f0d8501..b23f428 100644 --- a/Relay Notes/Transcription/ModelStores.swift +++ b/Relay Notes/Transcription/ModelStores.swift @@ -19,7 +19,9 @@ import Observation @MainActor @Observable final class ModelStores { + /// Backing store for the Whisper (small.en) MLX engine. let whisper: WhisperModelStore + /// Backing store for the Parakeet (TDT 0.6b) MLX engine. let parakeet: ParakeetModelStore /// The L2 cleanup (LLM) model store. Not a `TranscriptionEngine`, so it's a /// sibling here — outside the engine `store(for:)` / `readyEngines` machinery — @@ -27,51 +29,25 @@ final class ModelStores { /// Cleanup gating reads `cleanup.status` directly (see `NoteDetailView`). let cleanup: CleanupModelStore - init() { - self.whisper = WhisperModelStore() - self.parakeet = ParakeetModelStore() - self.cleanup = CleanupModelStore() - } - - // Explicit-store overloads for tests (e.g. a store bound to a temp directory). - // Each unspecified store is the real Application-Support-backed one. These are - // separate inits rather than defaulted parameters because a default value like - // `= WhisperModelStore()` is evaluated in a nonisolated context and can't call - // the `@MainActor` store init; constructing in-body (this `@MainActor` init) is - // fine. - - init(whisper: WhisperModelStore) { - self.whisper = whisper - self.parakeet = ParakeetModelStore() - self.cleanup = CleanupModelStore() - } - - init(parakeet: ParakeetModelStore) { - self.whisper = WhisperModelStore() - self.parakeet = parakeet - self.cleanup = CleanupModelStore() - } - - init(whisper: WhisperModelStore, parakeet: ParakeetModelStore) { - self.whisper = whisper - self.parakeet = parakeet - self.cleanup = CleanupModelStore() - } - - init(cleanup: CleanupModelStore) { - self.whisper = WhisperModelStore() - self.parakeet = ParakeetModelStore() - self.cleanup = cleanup - } - - /// All-explicit overload — lets a test pin every store to a temp directory so - /// engine readiness is deterministic *while* the cleanup store is `.ready`, - /// which is what makes "a ready cleanup model is excluded from engine gating" - /// (the design choice this registry encodes) directly assertable. - init(whisper: WhisperModelStore, parakeet: ParakeetModelStore, cleanup: CleanupModelStore) { - self.whisper = whisper - self.parakeet = parakeet - self.cleanup = cleanup + /// Builds the registry, defaulting any store the caller leaves out to the real + /// Application-Support-backed one. Production calls `ModelStores()` (all + /// defaulted); tests pass only the slots they pin to a temp directory — e.g. + /// `ModelStores(whisper:)` for deterministic Whisper readiness, or the + /// all-explicit form to assert "a ready cleanup model is excluded from engine + /// gating" (the design choice this registry encodes). + /// + /// The slots are `nil`-defaulted rather than `= WhisperModelStore()`-defaulted + /// because a non-`nil` default is evaluated in a nonisolated context and can't + /// call the `@MainActor` store init; the `??` fallbacks run in this `@MainActor` + /// init body, where constructing a store is fine. + init( + whisper: WhisperModelStore? = nil, + parakeet: ParakeetModelStore? = nil, + cleanup: CleanupModelStore? = nil + ) { + self.whisper = whisper ?? WhisperModelStore() + self.parakeet = parakeet ?? ParakeetModelStore() + self.cleanup = cleanup ?? CleanupModelStore() } /// The download store backing `engine`, or `nil` for engines with no diff --git a/Relay Notes/Transcription/Parakeet/ParakeetDecoder.swift b/Relay Notes/Transcription/Parakeet/ParakeetDecoder.swift index 29a8bd5..7ce71bc 100644 --- a/Relay Notes/Transcription/Parakeet/ParakeetDecoder.swift +++ b/Relay Notes/Transcription/Parakeet/ParakeetDecoder.swift @@ -115,7 +115,14 @@ nonisolated final class ParakeetJointNetwork: Module { /// `jointNet` would key as `joint.jointNet.*` and silently miss the /// safetensors `joint.joint_net.2.*` under `update(verify: .none)` — leaving /// the final Linear at random init and producing real-but-wrong tokens. - let joint_net: [Module] + /// + /// **Typed `[UnaryLayer]` (not `[Module]`)** so the joint can apply the + /// trailing Linear as `joint_net[2](…)` directly — every element is a + /// `UnaryLayer` — instead of a per-decode-step `as? Linear` cast + `fatalError`. + /// The element type does **not** affect weight keying: MLXNN reflects the array + /// by runtime value (`build(value:)` casts to `[Any]` and inspects each element + /// as `Module`), so each slot still keys by index (`joint_net.0/1/2`). + let joint_net: [UnaryLayer] nonisolated init(config: ParakeetJointConfig) { let net = config.jointnet @@ -133,10 +140,9 @@ nonisolated final class ParakeetJointNetwork: Module { let e = enc(encFrame).expandedDimensions(axis: 2) // [1, 1, 1, jointHidden] let p = pred(predOut).expandedDimensions(axis: 1) // [1, 1, 1, jointHidden] let activated = relu(e + p) - guard let finalLinear = joint_net[2] as? Linear else { - fatalError("joint_net[2] must be the final Linear") - } - return finalLinear(activated) + // `joint_net[2]` is the final `Linear` (slots 0/1 are the activation + + // identity that preserve the safetensors index); apply it via `UnaryLayer`. + return joint_net[2](activated) } } diff --git a/Relay Notes/Transcription/Transcriber.swift b/Relay Notes/Transcription/Transcriber.swift index 8d52f2b..f185fcf 100644 --- a/Relay Notes/Transcription/Transcriber.swift +++ b/Relay Notes/Transcription/Transcriber.swift @@ -2,12 +2,17 @@ import AVFoundation import Foundation import Speech +/// Per-engine transcription configuration, passed into `Transcriber`. Only Apple +/// Speech carries tunable options today; the MLX engines run fixed pipelines. enum TranscriptionOptions: Sendable { case apple(AppleSpeechOptions) case whisperMLX case parakeetMLX } +/// Apple Speech dials: the recognition `preset` and any `contextualStrings` +/// (domain terms/names biased toward during recognition). Defaults are set in +/// `Tunings`; what each does is in `planning/transcription-tuning.md`. struct AppleSpeechOptions: Sendable { var preset: SpeechTranscriber.Preset = .transcription var contextualStrings: [String] = [] @@ -21,8 +26,15 @@ struct AppleSpeechOptions: Sendable { /// `@MainActor` onto conformers — including onto an actor's synchronous `init` /// (see CHANGE_LOG 2026-06-11). Protocols are *not* on SE-0466's exemption list; /// declarations inside `actor` types are. +/// A live transcription session: the recorder feeds it captured audio buffers and +/// reads back partials, then `finish()`es it for the final transcript. One +/// session per recording; conformers pick their own isolation (hence +/// `nonisolated`, see below). nonisolated protocol TranscriptionSession: Sendable, AnyObject { + /// The PCM format this session wants buffers in; the capture engine converts + /// to it before `feed(_:)`. `nil` means "no preference" (use the tap format). var audioFormat: AVAudioFormat? { get } + /// Incremental transcript stream while recording (see `emitsLivePartials`). var updates: AsyncStream { get } /// Whether this session streams incremental partials through `updates` /// while recording. The session is the authority — it knows its own decode @@ -37,16 +49,31 @@ nonisolated protocol TranscriptionSession: Sendable, AnyObject { /// model it loaded — so it reports its own identity rather than the caller /// inferring it from the engine enum. var modelDescription: String { get } + /// Hand one captured audio buffer to the session (called from the audio feed). func feed(_ buffer: AVAudioPCMBuffer) + /// Close input and return the final transcript. Throws + /// `TranscriptionError.noSpeechDetected` when nothing was recognized. func finish() async throws -> String + /// Abandon the session and release its resources without producing a result. func cancel() async } +/// A transcription engine, hidden behind a protocol so the runtime provider is +/// swappable (the architecture spine — see CLAUDE.md). Conformers choose their own +/// isolation, so this is `nonisolated`. Both methods are intentional: the +/// streaming one is what the recorder uses today; the file-based one is currently +/// unused but reserved for cloud STT and a future "re-transcribe" action — **do +/// not remove it as dead code.** nonisolated protocol Transcriber: Sendable { + /// Transcribe an already-recorded audio file in one shot. Reserved (cloud STT, + /// re-transcribe); not on the live recording path today. func transcribe(_ audio: URL, options: TranscriptionOptions) async throws -> String + /// Open a streaming session fed live audio buffers — the recorder's path. func makeStreamingSession(options: TranscriptionOptions) async throws -> any TranscriptionSession } +/// Failures surfaced by the transcription layer. The recorder maps each to a +/// generic, actionable user message; the specific case stays in logs. enum TranscriptionError: Error { case notAuthorized case localeNotSupported(Locale) diff --git a/Relay Notes/Transcription/TranscriberFactory.swift b/Relay Notes/Transcription/TranscriberFactory.swift index b6791b9..3c441f6 100644 --- a/Relay Notes/Transcription/TranscriberFactory.swift +++ b/Relay Notes/Transcription/TranscriberFactory.swift @@ -27,6 +27,9 @@ final class TranscriberFactory { self.stores = stores } + /// The cached `Transcriber` for `engine`, constructing it on first request. + /// Apple Speech is cached independently; the two MLX engines share a single + /// slot, so selecting one evicts the other (see `liveMLXTranscriber`). func transcriber(for engine: TranscriptionEngine) -> any Transcriber { switch engine { case .apple: diff --git a/planning/swift-best-practices-review.md b/planning/swift-best-practices-review.md index 9e5a79a..f2d6fa2 100644 --- a/planning/swift-best-practices-review.md +++ b/planning/swift-best-practices-review.md @@ -20,36 +20,35 @@ record of what was found, what was applied, and what's left. Update it as items are worked. > **Build status.** The first pass (surgical + view/config) was authored in a Linux -> container with no Xcode toolchain; rounds **3** and **B** were authored *and* -> build+test verified locally (iPhone 17 Pro simulator, 61-test suite green). -> Everything in the **Applied** sections is now merged to `main` and compiles/tests -> green. Remaining items keep the same gate: `xcodebuild build` + `xcodebuild test` -> before merge — item **A** also needs a device `ParakeetSmoke` run. +> container with no Xcode toolchain; rounds **3**, **B**, and **4** were authored +> *and* build+test verified locally (iPhone 17 Pro simulator, 61-test suite green). +> Item **A** is additionally **device-validated** (iPhone 15 Pro Max, +> `ParakeetSmoke.run()` both gates green). Earlier rounds are merged to `main`; +> round 4 (F, G, docstrings) **and A** are build+test green (A also device-green) in +> the working tree, pending commit. **The review backlog is now empty** — every +> actionable item is applied; only the cosmetic H/I and the declined `preset → switch` +> were intentionally skipped. --- ## Remaining work — start here -Open items only — everything below is **applied** / **declined** / **left-alone**. -Priority order; each links to its detail section further down. - -- [ ] **Docstring coverage → ≥80%** (CodeRabbit gate). Comments-only, no build risk. - Backfill the most-touched API surface; document the non-obvious, **not** filler. - → *Docstring coverage*. -- [ ] **F — `RecorderViewModel.stopAndTranscribe()` teardown dedup.** Collapse the - two identical light teardowns via a local `let session` captured *before* - `session.finish()`. Live recording path → build + recorder tests (ideally device). - → *Lower-priority polish*. -- [ ] **G — `ModelStores` six-`init` seam → one defaulted init / `forTesting(...)`.** - Build/test gated (construction is used app-wide + in tests). → *Lower-priority polish*. -- [ ] **A — `ParakeetDecoder` per-step `fatalError` → stored `Linear`** (deferred). - Needs a device `ParakeetSmoke` run — MLXNN parameter-key derivation is silently - breakable here. → *Larger refactors → Deferred*. +**Nothing open — every actionable item is applied.** F, G, and the docstring pass +landed 2026-06-19 (build + test verified); **A** landed 2026-06-20 (device +`ParakeetSmoke` verified on the iPhone 15 Pro Max). See *Applied* below for each. + +- [x] **A — `ParakeetDecoder` per-step `fatalError` → `[UnaryLayer]` joint head** + (2026-06-20, device-validated). Resolved *better* than the original "stored + `Linear`" sketch: retyping `joint_net` `[Module]` → `[UnaryLayer]` lets the joint + apply `joint_net[2](…)` directly (no cast, no `fatalError`) with **provably zero** + parameter-tree change — MLXNN's `build(value:)` reflects the array by runtime + value, so keying is independent of the static element type. → *Larger refactors*. **Not doing** (recorded for completeness): **H** (error-type naming) and **I** (time formatter) — low value; **preset → `switch`** — declined (`SpeechTranscriber.Preset` is a struct, not a closed enum). **Gate for any code change:** `xcodebuild build` + -`xcodebuild test` (simulator suite); **A** additionally needs a device `ParakeetSmoke` run. +`xcodebuild test` (simulator suite); MLX-touching changes additionally need a device +`ParakeetSmoke` / `MLXSmoke` run. --- @@ -88,6 +87,17 @@ suite, both green on the iPhone 17 Pro simulator). No behavior change on any pat | **E — `deleteNotes(at:)` invariant** | `Views/NotesListView.swift` | Comment: the `IndexSet` offsets index into `filteredNotes` *because* that's the `ForEach` source; keep them the same array or swipe-delete hits the wrong rows. | | **J — `-1` HTTP-status sentinel** | `Transcription/DownloadableModelStore.swift` | `unexpectedHTTPStatus(-1)` (the "succeeded with no file" guard) → distinct `CoordinatorError.missingDownloadResult`; no longer masquerades as a real status in logs. Falls through to the generic network-failure arm. | +### Applied — round 4: F, G, docstrings (committed 2026-06-19, **build + test verified**) + +Closes the last simulator-validatable items. `xcodebuild build` + the 61-test +suite, both green on the iPhone 17 Pro simulator. No behavior change on any path. + +| Item | Files | Resolution | +|---|---|---| +| **G — `ModelStores` six-`init` seam** | `Transcription/ModelStores.swift` | Six overloads → **one `nil`-defaulted init** (`whisper:parakeet:cleanup:`, each `?? Real…Store()`). Every call site (production `ModelStores()` + the test combos) is unchanged. `nil` defaults sidestep the original "can't call the `@MainActor` store init from a nonisolated default-arg context" problem — the `??` fallbacks run in the `@MainActor` init body. | +| **F — `stopAndTranscribe()` teardown dedup** | `Recording/RecorderViewModel.swift` | The two identical light teardowns (guard-else + success) collapse into one up-front block: cancel `updatesTask`, capture `let session = self.session`, nil the stored `session`/`currentAudioURL`, then `finish()`/`cancel()` on the **local**. State is already `.finalizing`, so cancelling the partials loop early is a no-op. Catch paths now `await session.cancel()` on the local (the only work `cleanupAfterFailure()` did that wasn't already done up-front), so behavior is preserved without that heavier helper. | +| **Docstring coverage** | `ModelStores.swift`, `RecorderViewModel.swift`, `TranscriberFactory.swift`, `Transcriber.swift`, `AppleSpeechTranscriber.swift` | Backfilled `///` on the **transcriber entry-point spine + recording view model** — the model-store registry, the recording state machine + its public methods, the `Transcriber`/`TranscriptionSession` protocols (incl. the load-bearing two-method `Transcriber` rationale CLAUDE.md flags), the factory resolution entry point, and Apple Speech (was 0%, the default engine). **Targeted, not metric-chased:** the remaining low %s on these files are private DI storage, obvious `init`s, function-body locals, and protocol-conformance re-docs — documenting those is the filler the goal explicitly forbids. The CodeRabbit denominator is narrower than a naive symbol count (PR #19's files read 18–39% by a raw count but CodeRabbit reported 50%), so these land well above the raw numbers. | + --- ## Deliberately *not* changed @@ -115,16 +125,26 @@ Things the review flagged that are better left alone: Items 1, 3, and 5 from the original list are now **applied** (see above). The rest: -### Deferred — do with a compiler in the loop - -- **`Transcription/Parakeet/ParakeetDecoder.swift` — `fatalError` on the - per-decode-step hot path** (`joint_net[2] as? Linear`, inside - `callAsFunction`). Resolving the final `Linear` once into a stored property - would remove the per-step cast + crash path — but adding a second - `Linear`-typed property risks MLXNN's parameter-tree key derivation - (`joint_net`'s own comment warns that property-name-based weight keying here - is load-bearing and *silently* breakable). Do this with the `ParakeetSmoke` - device validation in the loop to confirm weights still load. +### Applied — formerly deferred (needed a compiler / device in the loop) + +- ~~**`ParakeetDecoder.swift` — `fatalError` on the per-decode-step hot path**~~ — + **APPLIED (A, 2026-06-20, device-validated on the iPhone 15 Pro Max).** The + original sketch (resolve the final `Linear` into a stored property) was *rejected* + as the wrong fix: a second `Linear`-typed stored property would alias `joint_net[2]`, + adding a duplicate parameter key (`joint.finalLinear.*`) that breaks `verify: .all` + / save round-trips — trading a crash path for a latent loading bug. **Instead:** + retype `let joint_net: [Module]` → `[UnaryLayer]` (all three slots — `ReLU`, + `ParakeetIdentity`, `Linear` — already conform), so `callAsFunction` applies the + trailing Linear as `joint_net[2](activated)` directly — no cast, no `fatalError`. + **Why this is safe (the part that needed verifying):** MLXNN keys array children + by *runtime value*, not static type — `ModuleValue.build(value:)` casts the array + to `[Any]` and inspects each element `as Module`, so `joint.joint_net.0/1/2` is + derived identically whether the property is `[Module]` or `[UnaryLayer]`. Confirmed + on device: `ParakeetSmoke.run()` loaded the full model under `verify: .noUnusedKeys` + (which throws if `joint.joint_net.2.*` goes unconsumed) and **both** gates passed — + T2.1d decode `substring check = PASS ✅`, T2.1e `chunking check = PASS ✅`, peak + 1345.5 MB (unchanged). Since it's the same `Linear` with the same weights, output + is numerically identical to the pre-change path. - ~~**KV-cache `typealias`**~~ — **APPLIED (B, 2026-06-19, build + test verified).** Introduced `WhisperKV = (MLXArray, MLXArray)` (one attention's keys/values) and `WhisperLayerKVCache = (WhisperKV?, WhisperKV?)` (a decoder block's self+cross @@ -145,22 +165,15 @@ rest: ## Lower-priority polish -_Items C, D, E, J above are now **applied** (round 3). The rest:_ - -- **`RecorderViewModel.stopAndTranscribe()`** (F) repeats the - `updatesTask`/`session`/`currentAudioURL` teardown across the guard-else and - success paths (the catch path's `cleanupAfterFailure()` is a *heavier* - teardown — `await session.cancel()` + the feed/interruption/elapsed tasks — and - stays separate). **Correction to the original note:** you can't "cancel once - up-front and nil `session`" — the success path calls `try await session.finish()` - first. The real fix captures a local `let session = self.session`, nils the - stored property + cancels `updatesTask` immediately after `engine.stop()`, then - calls `.finish()` on the local. Collapses the two identical light copies into - one. Touches the live recording path → build + recorder tests (ideally device). -- **`ModelStores`** (G) has six `init` overloads as a test seam — a single - defaulted init or a `static func forTesting(...)` factory would be more pragmatic - and avoid silently spinning up real filesystem-backed stores for unspecified - slots. Build/test gated (construction is used app-wide + in tests). +_Items C, D, E, J (round 3) and **F, G (round 4)** above are now **applied**. The rest:_ + +- ~~**`RecorderViewModel.stopAndTranscribe()`** (F)~~ — **APPLIED (round 4).** The + local-`session` capture collapsed the two light teardowns into one up-front + block; catch paths `cancel()` the local. See the round-4 table for the full + resolution (and why `cleanupAfterFailure()` wasn't needed on the catch path). +- ~~**`ModelStores`** (G)~~ — **APPLIED (round 4)** as the single `nil`-defaulted + init (not the `forTesting(...)` factory — defaulted is the more idiomatic + collapse, and every call site already used labeled args). See the round-4 table. - **Error-type naming inconsistency** (H) — `WhisperModelError` (free-standing) vs nested `WhisperAudio.Error` / `WhisperTokenizer.Error` (which force `Swift.Error` qualification). Pick the free-standing form for consistency. @@ -177,6 +190,14 @@ _Items C, D, E, J above are now **applied** (round 3). The rest:_ ## Docstring coverage (CodeRabbit gate) +> **Status: first pass applied (round 4, 2026-06-19).** Backfilled the transcriber +> entry-point spine + recording view model (`ModelStores`, `RecorderViewModel`, +> `TranscriberFactory`, `Transcriber`, `AppleSpeechTranscriber`). This is a +> *per-PR moving target*, not a one-and-done — each future PR re-measures its own +> changed set, so the durable fix is the standing rule below (every new +> public/internal decl ships with a doc comment). The rest of this section is the +> rationale that guided the pass. + CodeRabbit's pre-merge checks (CHILL profile) flagged **docstring coverage at 50% against an 80% threshold** on PR #19's changed set. It's a *soft warning* — it didn't block the merge — but it names a real open-source-readiness gap worth