Skip to content

WhisperMLX: revisit in-memory PCM buffering for long recordings #1

Description

@samkeen

Context

The first cut of WhisperMLXTranscriber.makeStreamingSession accumulates PCM buffers in memory across the lifetime of a single recording. Whisper's file-based decode runs once at finish().

Why this is OK for the first cut

  • Simplicity: no scratch-file I/O, no second writer competing with LiveAudioEngine's AAC writer.
  • Target device is iPhone 15 Pro Max (8 GB RAM). 16 kHz mono float32 = ~115 MB for 30 min of audio — comfortable headroom.

When it stops being OK

  • Recordings longer than ~30 min on the iPhone 15 Pro Max.
  • Any length on lower-RAM devices, if the device scope expands beyond the 15 Pro Max.
  • Memory-pressure warnings under dogfood.

Options when revisiting

  1. Stream PCM to a scratch WAV during feed, read back at finish. Lower peak memory, more disk I/O. Cleanest separation from the AAC writer.
  2. Reuse the in-progress AAC/m4a file that LiveAudioEngine already writes. Decode it back to PCM at finalize. Avoids the second writer entirely, but couples the transcriber to the recording-file format.
  3. Chunked decode with interim transcripts. Out of scope for the no-streaming first cut, but the natural follow-up if we want live partials from Whisper.

Trigger to act

  • Real notes regularly exceed ~20 min, OR
  • Memory-pressure warnings in os_log/Instruments during dogfood, OR
  • Device scope expands beyond the iPhone 15 Pro Max.

References

  • planning/notes.md — Transcription upgrades section (added in the same plan that created this issue).
  • planning/transcription-tuning.md — Tier 2 (Local ASR via MLX).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions