Native macOS application for system-wide speech-to-text and text-to-speech conversion. Must have brute agent running locally.
- Automatic speech-to-text capture and automatic paste into any focused input with keyboard press (F12)
- Floating recording window with live waveform visualization
- Automatic text-to-speech generation of currently selectect text with a keyboard press (also F12)
- Floating playback window for text-to-speech with stop, pause, and seek controls
- Brute AI agent session creation from speech with a keyboard press(F11)
- Smart context detection:
- Text selected → Text-to-Speech (plays audio)
- No selection → Speech-to-Text (records audio, transcribes, pastes result)
- Menu bar presence with settings window
- Selectable TTS engines in Settings: automatic, native macOS speech, and
edge-tts
- Selectable TTS engines in Settings: automatic, native macOS speech, and
- macOS 13.0+
- Xcode 14.0+
- Microphone permissions
- Accessibility permissions (for global shortcuts and text insertion)
- Optional:
edge-ttsinPATHor a common local install location for higher-quality online TTS
-
Start the backend (required for speech-to-text):
./scripts/start-backend.sh
-
Open in Xcode:
open adapter-mac.xcodeproj
-
Build and Run (
Cmd+Rin Xcode) -
Grant permissions when prompted:
- Microphone access
- Accessibility access
-
Open Settings and confirm the backend URL if needed.
-
Test it:
- Select any text → Press F12 → Listen to speech
- No selection → Press F12 → Speak → Press F12 again → Text pasted
- Press F11 → Speak → Press F11 again → New brute session starts from the transcript
adapter-mac depends on the A2gent brute backend for Whisper transcription. Speech-to-text will not work unless that service is running.
cd ~/git/a2gent/brute
make runOr use the helper script:
./scripts/start-backend.shDefault transcription endpoint:
http://localhost:5445/speech/transcribe
Test the endpoint:
./scripts/test-whisper.shadapter-mac supports:
edge-ttsfor higher-quality voices via Microsoft online TTS- native macOS speech synthesis as a local fallback
When edge-tts is selected or used by the automatic engine, the selected text is sent to Microsoft's online text-to-speech service to generate audio. If you prefer local-only speech synthesis, choose the native macOS voice option in Settings.
- Swift + AppKit for native macOS experience
- AVFoundation for audio recording and playback
- Carbon for global keyboard shortcuts
- Accessibility API for text selection detection and insertion
- brute backend integration for speech-to-text
flowchart TD
AD["AppDelegate"] --> AX["AccessibilityService"]
AD --> AS["AudioService"]
AD --> RW["RecordingWindow"]
AD --> PW["PlaybackWindow"]
AD --> WS["WhisperService"]
AS --> EDGE["edge-tts (online)"]
AS --> NSS["macOS speech synthesis (local fallback)"]
AS --> PLAYER["AVAudioPlayer"]
WS --> BRUTE["brute backend"]
- Click menu bar icon to configure settings
- Press configured shortcut:
- With text selected: Converts text to speech and plays audio
- Without selection: Opens recording window
- While recording, press shortcut again to stop and transcribe
- Transcribed text is automatically pasted at cursor position
- Use the brute session shortcut to record a fresh prompt and send it straight into a new brute session
Private project
