SapoWhisper is a small macOS menu bar app for fast speech-to-text.
Press Option + Space, speak, press it again, and the transcript is pasted into the app you were using.
- β‘ Global hotkey recording with a compact floating overlay.
- π Auto-paste via clipboard +
Cmd+V; no live typing while you speak. - π§ Local and cloud transcription engines.
- ποΈ Searchable history with saved audio, cancelled-recording recovery, replay, download, pinning, and re-transcription.
- ποΈ Preferred microphone sync, route-change resilience, gain control, and optional auto-ducking.
- ποΈ Batch audio upload quality profiles, from ultra-fast compact WAVs to native Float32.
- πͺ Optional AI polish through any OpenAI-compatible provider (OpenRouter by default) with a built-in fidelity guard and a shared output-language picker that can keep the audio language or translate faithfully to the selected target.
- π Guided setup for Microphone and Accessibility permissions.
| Engine | Mode | Best for |
|---|---|---|
| WhisperKit | Local | Private offline transcription. |
| Local AI Server (NVIDIA) | Batch LAN | Offloading transcription to a local NVIDIA GPU server with an OpenAI-style STT endpoint. |
| Deepgram Nova-3 | Batch | High-accuracy cloud transcription. |
| Deepgram Flux Live | Realtime | Low-latency streaming with WAV backup. |
| ElevenLabs Scribe v2 | Batch | Accurate Scribe transcription. |
| ElevenLabs Scribe Realtime v2 | Realtime | Low-latency Scribe with committed-text buffering. |
Cloud and optional local-server credentials are stored locally on the user's Mac. Never commit API keys, exported recordings, logs, DMGs, archives, or signing files.
Batch engines use the selected microphone upload-quality profile: Ultra fast (16 kHz Int16), Medium by default (24 kHz Int16), High (native rate up to 48 kHz Int16), or Ultra original (native Float32). Realtime engines keep their required 16 kHz Int16 streaming format.
TestAssets/LocalAITranscription/ contains public synthetic WAV fixtures for local STT testing:
longform/sample-1m.wavlongform/sample-2m.wavlongform/sample-3m.wavlongform/sample-6m.wavtechnical/en/*.wavtechnical/es/*.wav
Use scripts/local_stt_benchmark.sh with any OpenAI-compatible local STT server:
BASE_URL=http://YOUR_SERVER_IP:8000 \
MODEL_ID=rtlingo/mobiuslabsgmbh-faster-whisper-large-v3-turbo \
AUDIO_PATH=TestAssets/LocalAITranscription/longform/sample-1m.wav \
scripts/local_stt_benchmark.sh- macOS 14.0 or later
- Apple Silicon Mac (
arm64, M1 and newer) - Xcode with command line tools
- Microphone permission
- Accessibility permission for auto-paste
git clone <repo-url>
cd SapoWhisper
make tools
make ci-checkOpen SapoWhisper.xcodeproj in Xcode and run the SapoWhisper scheme.
The tracked project defaults to local signing (Sign to Run Locally) so contributors can build without the maintainer's Apple Developer Team ID.
make format
make lint
make ci-checkmake format: format changed Swift files with Xcode's bundledswift-format.make lint: lint changed Swift files without editing them.make test: run theSapoWhisperTestsunit bundle.make ci-check: lint + Debug build + unit tests.make release-check: lint + Release build + bundle size audit.make install-dev: signed Release reinstall to/Applicationsfor local UI iteration without resetting macOS permission grants.make format-all/make lint-all: full-repo passes for planned formatting work.
Optional hooks:
make hooks-installRelease builds target Apple Silicon only.
make release-check
scripts/measure_release_bundle.sh \
build/audit-release/Build/Products/Release/SapoWhisper.appCurrent arm64 cleanup baseline:
.app: 29,624 KB -> 20,624 KB (-30.38%)- executable: 17,708 KB -> 8,712 KB (
-50.80%) - local compressed test DMG: about 13-14 MB
Local test DMGs are usually ad-hoc signed with hardened runtime. Do not present them as notarized unless notarization was explicitly verified.
Unit tests live in SapoWhisperTests and cover pure logic: the AI polish fidelity guard, failure mapping, engine migration, audio upload quality, realtime replay conversion, and settings transfer.
make testUse make ci-check (lint + build + tests) as the main local gate and make release-check before packaging.
Tracked and public-safe:
- Source code, app assets, localized strings, sound effects, entitlements, Xcode project metadata, shared scheme,
Package.resolved, Makefile, scripts, README, changelog,AGENTS.md, contributing notes, security notes, and license.
Ignored and local/private:
DMG/,docs/,.agents/,.claude/,.codex/,skills-lock.json,xcuserdata/,build/, logs, crash reports, credentials,.env*, exported audio, DMGs, archives, and local signing files.
Before opening a PR:
make ci-check
git diff --checkSee CONTRIBUTING.md.
MIT