Speak any language. In your own voice.
Real-time voice translation with a virtual microphone β works in every meeting app.
Quick Start Β· Architecture Β· Setup Guide Β· API Reference
English β Japanese β Indonesian β Russian β Korean , real-time, in the speaker's own cloned voice.
2026-04-23.01-23-44.mov
You're in a meeting with colleagues in Tokyo, clients in SΓ£o Paulo, and partners in Berlin. You speak Indonesian. They hear you β fluently, naturally, instantly β in Japanese, Portuguese, and German. In your voice.
Not a robotic translation. Not a subtitle at the bottom of the screen. Not a five-second delay while some server thinks about it.
You. Speaking their language. In real time. In your own voice.
VoiceBridge captures your microphone, transcribes your speech, translates it through an LLM, clones your voice, and outputs the translated audio through a virtual microphone β so any meeting app hears the translated version. Other participants don't install anything. They don't configure anything. They just hear you, speaking their language, as if you always could.
| Requirement | macOS | Ubuntu/Linux | Windows |
|---|---|---|---|
| Node.js 18+ | nodejs.org | sudo apt install nodejs npm |
nodejs.org |
| ffmpeg | brew install ffmpeg |
sudo apt install ffmpeg |
ffmpeg.org/download |
| Homebrew | brew.sh | β | β |
| PulseAudio/PipeWire | β | Pre-installed on Ubuntu 22.04+ | β |
| ElevenLabs API key | elevenlabs.io | elevenlabs.io | elevenlabs.io |
| LLM API key | openrouter.ai / openai.com / anthropic.com | same | same |
ffmpeg is required for real-time mic capture and virtual mic audio output. Without it, VoiceBridge falls back to a silent mock (no audio).
βββββββββββ βββββββββββββ βββββββββββββββ βββββββββββββ ββββββββββββββββ
β Your βββββΆβ TranscribeβββββΆβ Translate βββββΆβ Your CloneβββββΆβ Virtual Mic β
β Voice β β (Scribe) β β (LLM) β β Voice β β "VoiceBridgeβ
β 16kHz β β 150ms β β 300ms β β 75ms β β Mic" β
βββββββββββ βββββββββββββ βββββββββββββββ βββββββββββββ ββββββββββββββββ
Five stages. Under 1.5 seconds. Works everywhere.
| Stage | What Happens | Technology | Latency |
|---|---|---|---|
| Capture | Real mic audio captured via ffmpeg | avfoundation (macOS) / pulse (Linux) / dshow (Windows) | 10ms |
| Transcribe | Speech becomes text in real-time | ElevenLabs Scribe v2 Realtime | 150ms |
| Translate | Text translated token-by-token | OpenAI / Anthropic / OpenRouter | 300ms |
| Synthesize | Translated text becomes speech in your voice | ElevenLabs Flash v2.5 TTS | 75ms |
| Output | Translated audio written to virtual mic | ffmpeg β BlackHole / PulseAudio / VB-CABLE | 10ms |
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Electron Desktop App β
β β
β βββββββββββββββββββ βββββββββββββββββββββββββββ β
β β Main Process β β Renderer (Preact) β β
β β Node.js + N-API ββββΊβ Nothing Design System β β
β β βIPCβ β β
β β β’ Pipeline β β β’ Main Window (360Γ480) β β
β β β’ Audio Router β β β’ System Tray β β
β β β’ Settings β β β’ Settings View β β
β β β’ Driver Mgmt β β β’ Debug Log β β
β ββββββββββ¬ββββββββββ βββββββββββββββββββββββββββ β
β β β
β ββββββββββΌββββββββββ β
β β Audio I/O β β
β β (ffmpeg) β β
β β β β
β β β’ Mic Capture β β
β β β’ Virtual Mic Out β β
β β β’ Resampling β β
β ββββββββββ¬ββββββββββββ β
βββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββ
β OS Audio Layer β
β β
β ββββββββββββββ βββββββββββββββββββββββ β
β β Real Mic β β "VoiceBridge Mic" β β
β β (hardware) β β (virtual driver) β β
β ββββββββββββββ ββββββββββββ¬βββββββββββ β
β β β
β ββββββββββββΌβββββββββββ β
β β Any Meeting App β β
β β Teams / Zoom / Meet β β
β β Discord / Slack β β
β βββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| OS | What Gets Installed | How |
|---|---|---|
| macOS | BlackHole 2ch | brew install blackhole-2ch |
| Ubuntu/Linux | PulseAudio/PipeWire null sink | pactl load-module module-null-sink |
| Windows | VB-CABLE | Manual download + Run as Administrator |
Speak naturally. Your words are transcribed, translated, and re-spoken in your cloned voice β all while you're still finishing your sentence.
Hold SPACE or the on-screen button to talk. Each press is an independent utterance β no accumulation, no feedback loops, no background noise pickup.
Record 30 seconds. VoiceBridge clones your voice. Now you speak 90+ languages and it still sounds like you.
Teams. Zoom. Google Meet. Discord. Slack. FaceTime. WhatsApp. Any app that uses a microphone.
OLED blacks. Space Mono labels. Mechanical toggles. System tray app that stays out of your way.
# macOS
brew install ffmpeg sox
# Ubuntu/Debian
sudo apt install ffmpeg sox
# Windows β download from https://ffmpeg.org/download.htmlgit clone https://github.com/AlleyBo55/VoiceBridge.git
cd VoiceBridge/desktop
npm installnpm run devVoiceBridge walks you through setup:
- Prerequisites β checks for ffmpeg, sox, and virtual mic driver. One-click install for each.
- API Keys β enter your ElevenLabs key and LLM key. Keys are validated before saving.
- Voice Clone β record 30+ seconds of your voice. Skip to use a default voice.
Keys are encrypted with AES-GCM-256 and stored only on your device. VoiceBridge has no server.
- Open any meeting app β select "BlackHole 2ch" as your microphone
- Toggle translation on in VoiceBridge
- Hold SPACE and speak β other participants hear your translated voice
Input: Every language ElevenLabs Scribe supports. Auto-detect is default. Output: Every language ElevenLabs TTS supports. Any-to-any. No restrictions.
| Layer | Choice | Why |
|---|---|---|
| App Shell | Electron | Cross-platform desktop, native addon support |
| UI | Preact + CSS Custom Properties | 3KB gzipped, Nothing design system |
| Audio I/O | ffmpeg | Real mic capture + virtual mic output |
| Virtual Mic | BlackHole / PulseAudio / VB-CABLE | OS-level virtual audio device |
| STT | ElevenLabs Scribe v2 Realtime | 150ms latency, 90+ languages |
| TTS | ElevenLabs Flash v2.5 | 75ms latency, voice cloning |
| Translation | OpenAI / Anthropic / OpenRouter | Streaming, 200+ models |
| Testing | Vitest + fast-check | Property-based correctness |
- Audio is streamed, never stored
- API keys encrypted with AES-GCM-256
- No analytics. No tracking. No telemetry.
- No embedded keys β the build ships empty
- Panic button (Ctrl/Cmd+Shift+X) kills everything instantly
| Shortcut | Action |
|---|---|
Space |
Push-to-talk (hold to speak) |
Ctrl/Cmd+Shift+T |
Toggle translation |
Ctrl/Cmd+Shift+G |
Toggle Ghost Mode |
Ctrl/Cmd+Shift+X |
Panic stop |
cd desktop
npm install # Install dependencies
npm run dev # Build + launch Electron with hot-reload
npm run test # Run 42 property-based testsdesktop/
βββ src/
β βββ main/ # Electron main process
β β βββ main.ts # Entry, tray, window, IPC
β β βββ desktop-pipeline.ts # Mic β STT β LLM β TTS β BlackHole
β β βββ audio-router.ts # VAD, noise gate, routing
β β βββ driver-installer.ts # Virtual mic driver install
β β βββ ...
β βββ native/ # ffmpeg audio I/O
β βββ preload/ # Security boundary
β βββ renderer/ # Preact UI
β βββ shared/ # Types, platform utils
βββ tests/properties/ # Property-based tests
This project was built using Kiro's spec-driven development β requirements β design β implementation, systematically.
Phase 1 β Chrome Extension
- Requirements Β· Design Β· Tasks
Phase 2 β Pipeline Hardening
- Requirements Β· Design Β· Tasks
Phase 3 β Desktop App
- Requirements Β· Design Β· Tasks
MIT β use it, fork it, ship it.
"The people who are crazy enough to think they can change the world are the ones who do."
Built for ElevenLabs Γ Kiro Hackathon
ElevenLabs Β· Kiro Β· #ElevenHacks Β· #CodeWithKiro



