adapter-mac - Speech To Text & Text To Speech

Native macOS application for system-wide speech-to-text and text-to-speech conversion. Must have brute agent running locally.

Features

Automatic speech-to-text capture and automatic paste into any focused input with keyboard press (F12)
- Floating recording window with live waveform visualization
Automatic text-to-speech generation of currently selectect text with a keyboard press (also F12)
- Floating playback window for text-to-speech with stop, pause, and seek controls
Brute AI agent session creation from speech with a keyboard press(F11)
Smart context detection:
- Text selected → Text-to-Speech (plays audio)
- No selection → Speech-to-Text (records audio, transcribes, pastes result)
Menu bar presence with settings window
- Selectable TTS engines in Settings: automatic, native macOS speech, and edge-tts

Requirements

macOS 13.0+
Xcode 14.0+
Microphone permissions
Accessibility permissions (for global shortcuts and text insertion)
Optional: edge-tts in PATH or a common local install location for higher-quality online TTS

Quick Start

Start the backend (required for speech-to-text):
```
./scripts/start-backend.sh
```
Open in Xcode:
```
open adapter-mac.xcodeproj
```
Build and Run (Cmd+R in Xcode)
Grant permissions when prompted:
- Microphone access
- Accessibility access
Open Settings and confirm the backend URL if needed.
Test it:
- Select any text → Press F12 → Listen to speech
- No selection → Press F12 → Speak → Press F12 again → Text pasted
- Press F11 → Speak → Press F11 again → New brute session starts from the transcript

Setup

Backend Setup

adapter-mac depends on the A2gent brute backend for Whisper transcription. Speech-to-text will not work unless that service is running.

cd ~/git/a2gent/brute
make run

Or use the helper script:

./scripts/start-backend.sh

Default transcription endpoint:

http://localhost:5445/speech/transcribe

Test the endpoint:

./scripts/test-whisper.sh

Text-to-Speech Privacy

adapter-mac supports:

edge-tts for higher-quality voices via Microsoft online TTS
native macOS speech synthesis as a local fallback

When edge-tts is selected or used by the automatic engine, the selected text is sent to Microsoft's online text-to-speech service to generate audio. If you prefer local-only speech synthesis, choose the native macOS voice option in Settings.

Architecture

Swift + AppKit for native macOS experience
AVFoundation for audio recording and playback
Carbon for global keyboard shortcuts
Accessibility API for text selection detection and insertion
brute backend integration for speech-to-text

flowchart TD
    AD["AppDelegate"] --> AX["AccessibilityService"]
    AD --> AS["AudioService"]
    AD --> RW["RecordingWindow"]
    AD --> PW["PlaybackWindow"]
    AD --> WS["WhisperService"]

    AS --> EDGE["edge-tts (online)"]
    AS --> NSS["macOS speech synthesis (local fallback)"]
    AS --> PLAYER["AVAudioPlayer"]

    WS --> BRUTE["brute backend"]

Usage

Click menu bar icon to configure settings
Press configured shortcut:
- With text selected: Converts text to speech and plays audio
- Without selection: Opens recording window
While recording, press shortcut again to stop and transcribe
Transcribed text is automatically pasted at cursor position
Use the brute session shortcut to record a fresh prompt and send it straight into a new brute session

License

Private project

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.vscode		.vscode
Tests/AdapterMacTests		Tests/AdapterMacTests
adapter-mac.xcodeproj		adapter-mac.xcodeproj
scripts		scripts
stts		stts
.gitignore		.gitignore
Package.swift		Package.swift
README.md		README.md
SETUP.md		SETUP.md
TEST.md		TEST.md
logo-settings.png		logo-settings.png
logo-silent.png		logo-silent.png
logo-speaking.png		logo-speaking.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

adapter-mac - Speech To Text & Text To Speech

Features

Requirements

Quick Start

Setup

Backend Setup

Text-to-Speech Privacy

Architecture

Usage

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

adapter-mac - Speech To Text & Text To Speech

Features

Requirements

Quick Start

Setup

Backend Setup

Text-to-Speech Privacy

Architecture

Usage

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages