Voice Agent

A low-latency, real-time voice assistant for the terminal.

It captures microphone audio, performs local VAD with Silero, transcribes with OpenAI Whisper, generates responses with GPT-4o-mini, and streams speech back with GPT-4o-mini-tts. It supports barge-in, so users can interrupt playback naturally.

Why this project

Real-time, full-duplex interaction loop designed for responsiveness
Practical async architecture with careful thread/async boundaries around sounddevice
Production-minded cancellation design for smooth barge-in behavior
Good reference implementation for voice pipeline orchestration in Python

Features

Local microphone capture and speaker playback
Silero VAD speech start/end detection (16 kHz, 512-sample chunks)
Whisper transcription (whisper-1)
Streaming chat completions (gpt-4o-mini)
Sentence-level streaming TTS (gpt-4o-mini-tts)
Cooperative cancellation and immediate playback stop on interruption
Multi-turn memory in the chat layer
CLI-first workflow with minimal setup

Architecture

Mic (float32 chunks) -> VADDetector -> SPEECH_START -> barge-in cancel if playing
                               -> SPEECH_END (int16 PCM bytes)
                                   -> Transcriber (whisper-1) -> text
                                       -> ChatLLM (gpt-4o-mini stream) -> sentence text
                                           -> Synthesizer (gpt-4o-mini-tts) -> PCM stream
                                               -> AudioPlayback (sounddevice)

Latency target: speech end to first playback chunk under 1.5 seconds.

Quick start

1) Prerequisites

Python 3.11+
uv
OpenAI API key with access to whisper-1, gpt-4o-mini, and gpt-4o-mini-tts
PortAudio (sounddevice backend)

macOS:

brew install portaudio

2) Configure environment

cp .env.example .env

Set your key in .env:

OPENAI_API_KEY=sk-...

3) Run

uv run voice-agent

Verbose mode:

uv run voice-agent --verbose

Development

Run tests:

uv run pytest tests/

Project layout:

src/voice_agent/
  main.py                 # CLI entry point
  audio/capture.py        # microphone capture
  audio/playback.py       # PCM playback and stop signaling
  vad/detector.py         # Silero VAD integration
  asr/transcriber.py      # Whisper API wrapper
  llm/chat.py             # streaming GPT chat + sentence splitting
  tts/synthesizer.py      # streaming TTS PCM generator
  pipeline/orchestrator.py# end-to-end pipeline + barge-in control
tests/
  test_asr.py
  test_vad.py
  test_pipeline.py

Roadmap

Add packaging metadata for PyPI publishing
Add benchmark script for latency profiling
Add optional local/offline ASR and TTS backends
Add configurable wake-word mode

Contributing

Contributions are welcome. Please read CONTRIBUTING.md before opening a PR.

Security

Please report vulnerabilities privately as described in SECURITY.md.

License

This project is licensed under the MIT License. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github		.github
.opencode		.opencode
skills		skills
src/voice_agent		src/voice_agent
tests		tests
.cursorrules		.cursorrules
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice Agent

Why this project

Features

Architecture

Quick start

1) Prerequisites

2) Configure environment

3) Run

Development

Roadmap

Contributing

Security

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Voice Agent

Why this project

Features

Architecture

Quick start

1) Prerequisites

2) Configure environment

3) Run

Development

Roadmap

Contributing

Security

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages