Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions CHANGE_LOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,11 @@ Entry style: bold lead-in summarizing what shipped, then the *why* / non-obvious
- **Swift best-practices pass, round 2 — view/config dedup (open-source readiness).** Took on the lower-risk "larger refactors" from `planning/swift-best-practices-review.md`. (1) **Model-download sections deduped** — `WhisperModelSection`/`ParakeetModelSection`/`CleanupModelSection` were ~90% identical (status switch, download/cancel/delete, confirm-alert); extracted `Views/ModelDownloadSection.swift` typed on the `@Observable` `DownloadableModelStore` base, leaving the three as thin wrappers that supply only copy + `onDeleted`. `WhisperModelSection.failureMessage` kept (test-referenced); `SettingsView` call sites unchanged; auto-included via the synchronized group (no pbxproj edit). View composition only — provider abstraction untouched. (2) **`SettingsView.engineRow(_:title:subtitle:isEnabled:)`** extracted from three byte-identical engine `Button` rows. (3) **Parakeet config chain `Codable` → `Decodable`** + deleted the dead `ParakeetDecodingConfig.encode(to:)` (configs are decode-only — no `JSONEncoder` in the tree; the custom encoder only existed because `maxSymbols` has no `CodingKeys` case, and the whole chain moved since the parent's synthesized `Encodable` required the child's). **Deferred** (do with a compiler / `ParakeetSmoke` in the loop): the `ParakeetDecoder` per-step `fatalError` → stored-`Linear` fix (MLXNN parameter-tree keying is load-bearing) and the Whisper KV-cache `typealias` (cosmetic, high-churn, type-label risk). **Declined:** `presetLabel`/`presetID` → `switch` — `SpeechTranscriber.Preset` is an Equatable struct, not a closed enum. **Not compiled here** — needs an Xcode build + test run.
- **Swift best-practices pass, round 3 — lower-priority polish (open-source readiness), build- and test-verified.** First of the best-practices passes actually compiled + tested locally (`xcodebuild build` + the 61-test suite, both green on the iPhone 17 Pro simulator). No behavior change on any path. Items, keyed to `planning/swift-best-practices-review.md`: (C) **"GB mismatch" was a false alarm** — the cleanup-model spec is authoritatively `downloadSizeMB: 3446` (≈3.4 GB on disk, matching the UI copy and the summed `RemoteFile.size`s); the two stray "~2.7 GB" comments (`Cleaner.evict`, `NoteDetailView.onDisappear`) describe *resident memory freed*, a genuinely different quantity (Gemma E2B's per-layer-embedding/MatFormer design makes RAM < disk), so they were clarified as "resident memory" rather than forced to match the download. (D) **Magic numbers named** — `ParakeetAudio`'s two same-valued `1e-5`s split into `logMelFloor` (keeps `log()` finite on silent bins) vs `normEpsilon` (divide-by-zero guard in z-scoring); plus `AudioPlayer`'s `endSnapThreshold` (0.05 s end-snap), `LiveAudioEngine`'s `resamplerHeadroom` (1024-frame converter slack), `Note`'s `tieBreakNudge` (1 ms same-instant ordering tie-break). (E) **`NotesListView.deleteNotes(at:)`** gained an invariant comment: the `IndexSet` offsets index into `filteredNotes` *because* that's the `ForEach` source — keep them the same array or swipe-delete hits the wrong rows. (J) **`DownloadCoordinator`'s `unexpectedHTTPStatus(-1)` sentinel** (the "can't happen, download succeeded with no file" guard) replaced by a distinct `CoordinatorError.missingDownloadResult` so a fake `-1` never reads as a real HTTP status in logs; it falls through to the generic network-failure arm. **Deferred** (held for a focused build/test pass): item B (KV-cache `typealias`), F (`stopAndTranscribe` teardown dedup — note the doc's "cancel up-front" can't nil `session` before `session.finish()`; capture a local), G (`ModelStores` init seam). **Left alone:** A (`ParakeetDecoder` stored `Linear` — MLXNN param-key risk), H (error-type naming churn), I (formatter consolidation — stale framing: `RecorderView` already delegates to `RecorderViewModel.formatElapsed`).
- **Swift best-practices, item B — KV-cache `typealias` (build + test verified).** Named the Whisper decoder's cryptic cache tuples: `WhisperKV = (MLXArray, MLXArray)` (one attention's keys/values) and `WhisperLayerKVCache = (WhisperKV?, WhisperKV?)` (a decoder block's self- + cross-attention cache), declared at the top of `Transcription/WhisperModel.swift` and substituted at all 11 sites across `WhisperModel.swift` + `WhisperDecoding.swift` (the `ResidualAttentionBlock`/`TextDecoder`/`WhisperAttention` signatures, the `[WhisperLayerKVCache]` per-block arrays, and `WhisperDecoding`'s decode loop). Replaces the repeated `((MLXArray, MLXArray)?, (MLXArray, MLXArray)?)` so element chains like `kvCache[0].0` (block 0's self-attn KV) read against a named type. Pure compile-time substitution — identical underlying types, zero runtime change — so the simulator build + 61-test suite fully verify it without a device run. This was the last of the "deferred — needs a compiler in the loop" items worth taking; A (`ParakeetDecoder` stored `Linear`) stays deferred on the MLXNN param-key risk.
- **Swift best-practices, round 4 — F, G, docstrings (build + test verified).** Closes the last simulator-validatable items from `planning/swift-best-practices-review.md`; `xcodebuild build` + the 61-test suite green on the iPhone 17 Pro simulator, no behavior change on any path. (G) **`ModelStores`' six `init` overloads → one `nil`-defaulted init** (`whisper:parakeet:cleanup:`, each slot `?? Real…Store()`). The `nil` defaults sidestep the original blocker (a non-`nil` default arg is evaluated in a nonisolated context and can't call the `@MainActor` store init) by deferring construction to the `??` fallbacks in the `@MainActor` init body; every call site (production `ModelStores()` + the labeled test combos in `ModelStoresTests`/`ReTranscriberTests`/`WhisperMLXTranscriberTests`) is unchanged. (F) **`RecorderViewModel.stopAndTranscribe()` teardown dedup** — the two identical light teardowns (guard-else + success) collapse into one up-front block: cancel `updatesTask`, capture `let session = self.session`, nil the stored `session`/`currentAudioURL`, then `finish()`/`cancel()` on the *local*. State is already `.finalizing`, so cancelling the partials loop early is a no-op; the catch paths now `await session.cancel()` on the local (the only `cleanupAfterFailure()` work not already done up-front), preserving behavior without the heavier helper. (Docstrings) **First docstring-coverage pass** (CodeRabbit CHILL gate) — backfilled `///` on the transcriber entry-point spine + recording view model: `ModelStores`, `RecorderViewModel` (state machine + public methods + deps), `TranscriberFactory.transcriber(for:)`, the `Transcriber`/`TranscriptionSession` protocols (incl. the load-bearing two-method `Transcriber` rationale), and `AppleSpeechTranscriber` (was 0% — the default engine). Targeted, not metric-chased: the gate re-measures each PR's changed set, so the durable fix is the standing "every new public/internal decl ships a doc comment" rule, not filler on private storage / obvious `init`s. Only item **A** (`ParakeetDecoder` stored `Linear`) now remains, deferred on the device-only `ParakeetSmoke` validation.

## 2026-06-20

- **Swift best-practices, item A — `ParakeetDecoder` per-step `fatalError` removed (device-validated). Closes the review backlog.** The joint network's `callAsFunction` cast `joint_net[2] as? Linear` on every decode step and `fatalError`'d on miss. The review's original sketch — resolve the final `Linear` into a stored property — was **rejected**: a second `Linear`-typed property aliases `joint_net[2]`, adding a duplicate parameter key (`joint.finalLinear.*`) that breaks `update(verify: .all)` / save round-trips, trading a crash path for a latent loading bug. **Applied instead:** retype `let joint_net: [Module]` → `[UnaryLayer]` (all three slots — `ReLU`, `ParakeetIdentity`, `Linear` — already conform to `UnaryLayer`), so the joint applies the trailing Linear as `joint_net[2](activated)` directly — no cast, no `fatalError`, no new stored property. **Why the `joint_net` snake_case key survives the retype** (the load-bearing risk the review flagged): MLXNN keys array children by *runtime value*, not static element type — `MLXNN.ModuleValue.build(value:)` casts the array to `[Any]` and inspects each element `as Module`, so `joint.joint_net.0/1/2` is derived identically for `[Module]` and `[UnaryLayer]` (verified by reading mlx-swift 0.31.4 `Source/MLXNN/Module.swift`). Since it's the same `Linear` with the same weights applied the same way, output is numerically identical to the pre-change path. **Validated on the iPhone 15 Pro Max** (CLI device build → `devicectl install` → `ParakeetSmoke.run()` via the `#if DEBUG` Settings button, captured over `idevicesyslog`): full model loaded under `verify: .noUnusedKeys` (which *throws* if `joint.joint_net.2.*` goes unconsumed — it didn't), T2.1d decode `substring check ("openly shouldered the burden") = PASS ✅` (282 ms), T2.1e `chunking check = PASS ✅`, peak process footprint 1345.5 MB (unchanged, still fits without `increased-memory-limit`). Simulator build + 61-test suite also green. With this the `planning/swift-best-practices-review.md` backlog is empty — every actionable item applied; only cosmetic H/I and the declined `preset → switch` were intentionally skipped.

## 2026-06-17

Expand Down
59 changes: 48 additions & 11 deletions Relay Notes/Recording/RecorderViewModel.swift
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,17 @@ import Foundation
import Observation
import SwiftData

/// Drives the tap-to-record flow: owns the capture engine + streaming
/// transcription session, advances the recording state machine, and persists the
/// finished `Note`. Constructed lazily in `ContentView.task` and read by
/// `RecorderView`.
@MainActor
@Observable
final class RecorderViewModel {
/// The recording state machine. `partial` carries the latest streamed
/// transcript through `.recording`/`.paused` (empty for non-streaming
/// engines); `.paused` is driven by `AVAudioSession` interruptions, not the
/// user. See the "Tap-to-record state machine" note in CLAUDE.md.
enum State: Equatable {
case idle
case recording(partial: String)
Expand All @@ -15,6 +23,7 @@ final class RecorderViewModel {
case failed(message: String)
}

/// Current point in the recording flow; the single source of truth `RecorderView` renders.
private(set) var state: State = .idle

/// Whether the active recording's engine streams live partials. Captured
Expand All @@ -34,16 +43,24 @@ final class RecorderViewModel {
/// during interruption pauses). Shown in the Whisper placeholder.
private(set) var elapsed: Duration = .zero

/// Live recording/transcription dials, also surfaced to `SettingsView`. A
/// snapshot is read at record start, so mid-recording edits don't take effect
/// until the next session (see `Tunings`).
let tunings: Tunings

private let engine: LiveAudioEngine
private let transcriberFactory: TranscriberFactory
private let modelContext: ModelContext

/// The active streaming session, set at record start and released at finalize.
private var session: (any TranscriptionSession)?
/// Forwards captured audio buffers to the session (and drives the meter).
private var feedTask: Task<Void, Never>?
/// Mirrors the session's partial-transcript stream into `state`.
private var updatesTask: Task<Void, Never>?
/// Relays `AVAudioSession` interruption events into the state machine.
private var interruptionTask: Task<Void, Never>?
/// Accumulates `elapsed` for the non-streaming placeholder; nil otherwise.
private var elapsedTask: Task<Void, Never>?
private var currentAudioURL: URL?

Expand All @@ -59,6 +76,10 @@ final class RecorderViewModel {
self.tunings = tunings
}

/// Starts a streaming session and audio capture, wiring the feed, partials,
/// interruption, and (for non-streaming engines) elapsed-ticker tasks, then
/// enters `.recording`. Any failure tears everything down and lands in
/// `.failed` with a generic, actionable message (specifics stay in logs).
func startRecording() async {
do {
let transcriber = transcriberFactory.transcriber(for: tunings.engine)
Expand Down Expand Up @@ -123,6 +144,10 @@ final class RecorderViewModel {
}
}

/// Stops capture, finalizes the session into a transcript, and persists a
/// `Note` (audio + transcript + model provenance). No-op unless currently
/// `.recording`/`.paused`. On a missing recording or a finalize/empty-speech
/// failure, deletes the orphaned audio and lands in `.failed`.
func stopAndTranscribe() async {
switch state {
case .recording, .paused: break
Expand All @@ -138,22 +163,26 @@ final class RecorderViewModel {
elapsedTask?.cancel()
elapsedTask = nil

// Light teardown shared by both the guard-else failure and the success
// paths: cancel the partials loop and release the stored session up-front,
// finishing on a local handle. State is already `.finalizing`, so the
// partials loop is a no-op from here — cancelling it early changes nothing.
// (The catch paths still need this `session` to `cancel()` it, hence the
// local; they don't route through `cleanupAfterFailure()`, whose remaining
// work is already done here.)
updatesTask?.cancel()
updatesTask = nil
let session = self.session
self.session = nil
currentAudioURL = nil

guard let session, let url else {
updatesTask?.cancel()
updatesTask = nil
self.session = nil
currentAudioURL = nil
state = .failed(message: "Recording could not be saved. Please try again.")
return
Comment on lines +175 to 181

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle partial guard failure without leaking session/audio cleanup.

Line 179’s combined guard let session, let url drops into .failed without canceling an existing session or deleting an existing audio file when only one of the two is missing. This leaves a resource-leak/orphan-file gap exactly in the path this refactor centralized.

Suggested fix
-        let session = self.session
-        self.session = nil
-        currentAudioURL = nil
-
-        guard let session, let url else {
+        let session = self.session
+        self.session = nil
+        let fallbackURL = currentAudioURL
+        currentAudioURL = nil
+
+        guard let session else {
+            if let orphanURL = url ?? fallbackURL {
+                try? FileManager.default.removeItem(at: orphanURL)
+            }
+            state = .failed(message: "Recording could not be saved. Please try again.")
+            return
+        }
+        guard let url = url ?? fallbackURL else {
+            await session.cancel()
             state = .failed(message: "Recording could not be saved. Please try again.")
             return
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let session = self.session
self.session = nil
currentAudioURL = nil
guard let session, let url else {
updatesTask?.cancel()
updatesTask = nil
self.session = nil
currentAudioURL = nil
state = .failed(message: "Recording could not be saved. Please try again.")
return
let session = self.session
self.session = nil
let fallbackURL = currentAudioURL
currentAudioURL = nil
guard let session else {
if let orphanURL = url ?? fallbackURL {
try? FileManager.default.removeItem(at: orphanURL)
}
state = .failed(message: "Recording could not be saved. Please try again.")
return
}
guard let url = url ?? fallbackURL else {
await session.cancel()
state = .failed(message: "Recording could not be saved. Please try again.")
return
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Relay` Notes/Recording/RecorderViewModel.swift around lines 175 - 181, The
combined guard statement on line 179 checking `guard let session, let url`
returns immediately without properly cleaning up resources when either value is
missing. If session exists but url is nil, or url exists but session is nil, the
resources remain orphaned. Instead of a single combined guard check, validate
session and url separately - if session is present, cancel it; if url is
present, delete the audio file - and only then set state to .failed. This
ensures complete resource cleanup regardless of which value is missing.

}

do {
let transcript = try await session.finish()
updatesTask?.cancel()
updatesTask = nil
self.session = nil
currentAudioURL = nil

let note = Note(
audioFilename: url.lastPathComponent,
transcript: transcript,
Expand All @@ -163,16 +192,18 @@ final class RecorderViewModel {
try modelContext.save()
state = .finished(transcript: transcript)
} catch TranscriptionError.noSpeechDetected {
await cleanupAfterFailure()
await session.cancel()
try? FileManager.default.removeItem(at: url)
state = .failed(message: "We didn't hear any speech. Try again and speak clearly.")
} catch {
await cleanupAfterFailure()
await session.cancel()
try? FileManager.default.removeItem(at: url)
state = .failed(message: "Something went wrong transcribing your note. Please try again.")
}
}

/// Returns to `.idle` after the user dismisses a `.finished`/`.failed` result,
/// readying the recorder for the next take.
func reset() {
state = .idle
}
Expand All @@ -192,6 +223,9 @@ final class RecorderViewModel {
}
}

/// Applies an `AVAudioSession` interruption: pure pause/resume transitions via
/// `nextState`, or — on `.stopped` while paused — auto-finalizes so the
/// captured audio isn't lost (there's no manual-resume affordance).
private func handleInterruption(_ event: InterruptionEvent) async {
if let next = Self.nextState(for: event, from: state) {
state = next
Expand Down Expand Up @@ -264,6 +298,9 @@ final class RecorderViewModel {
return String(format: "%d:%02d", totalSeconds / 60, totalSeconds % 60)
}

/// Heavy teardown for the failure paths: cancels every task and `cancel()`s
/// the session (vs. the lighter inline teardown in `stopAndTranscribe()`'s
/// success path, which `finish()`es the session instead).
private func cleanupAfterFailure() async {
feedTask?.cancel()
feedTask = nil
Expand Down
16 changes: 16 additions & 0 deletions Relay Notes/Transcription/AppleSpeechTranscriber.swift
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,19 @@ import AVFoundation
import Foundation
import Speech

/// `Transcriber` over Apple's on-device `SpeechAnalyzer`/`SpeechTranscriber` — the
/// app's default engine. `nonisolated` (no MLX, no shared mutable state); both the
/// file-based and streaming paths request authorization, resolve a supported
/// locale, install the on-device model assets, then run the analyzer.
nonisolated final class AppleSpeechTranscriber: Transcriber {
private let locale: Locale

init(locale: Locale = .current) {
self.locale = locale
}

/// One-shot file transcription via `SpeechAnalyzer.analyzeSequence(from:)`,
/// joining recognized segments. Throws `.noSpeechDetected` on an empty result.
func transcribe(_ audio: URL, options: TranscriptionOptions) async throws -> String {
guard case .apple(let appleOptions) = options else {
preconditionFailure("AppleSpeechTranscriber received non-apple options — factory and engine selection are out of sync")
Expand Down Expand Up @@ -79,6 +85,8 @@ nonisolated final class AppleSpeechTranscriber: Transcriber {
return transcript
}

/// Opens a live session: starts the analyzer over an `AnalyzerInput` stream and
/// returns an `AppleSpeechSession` that yields volatile + finalized partials.
func makeStreamingSession(options: TranscriptionOptions) async throws -> any TranscriptionSession {
guard case .apple(let appleOptions) = options else {
preconditionFailure("AppleSpeechTranscriber received non-apple options — factory and engine selection are out of sync")
Expand Down Expand Up @@ -144,6 +152,10 @@ nonisolated final class AppleSpeechTranscriber: Transcriber {
}
}

/// Live Apple Speech session. A background task drains `transcriber.results`,
/// yielding a running transcript to `updates`: volatile (in-progress) results are
/// appended after the finalized prefix so the live card updates as you speak, and
/// each `isFinal` result is committed into that prefix.
private final class AppleSpeechSession: TranscriptionSession {
let audioFormat: AVAudioFormat?
let updates: AsyncStream<String>
Expand Down Expand Up @@ -203,6 +215,8 @@ private final class AppleSpeechSession: TranscriptionSession {
inputContinuation.yield(AnalyzerInput(buffer: buffer))
}

/// Closes input, finalizes the analyzer through end-of-input, and returns the
/// committed transcript. Throws `.noSpeechDetected` when it's empty.
func finish() async throws -> String {
inputContinuation.finish()
do {
Expand All @@ -218,6 +232,8 @@ private final class AppleSpeechSession: TranscriptionSession {
return transcript
}

/// Tears the session down without a result: closes input, cancels the analyzer
/// and the results task, and finishes the `updates` stream.
func cancel() async {
inputContinuation.finish()
await analyzer.cancelAndFinishNow()
Expand Down
Loading