Real-Time AI Voice Agent
A real-time voice agent built using LiveKit Agents, AssemblyAI for Speech-to-Text (STT), Cartesia for Text-to-Speech (TTS), Mistral / OpenAI LLMs, and Silero Voice Activity Detection (VAD).
This project enables real-time, bidirectional voice interactions with AI models by combining streaming audio processing with large language models, allowing practical use cases such as conversational assistants, voice-driven applications, and real-time natural language dialogue systems.
- Live voice processing pipeline
- Continuous speech capture and decoding
- Real-time agent responses
- Speech-to-Text (STT)
- Powered by AssemblyAI (or other ASR systems)
- Text-to-Speech (TTS)
- Cartesia or other configurable TTS providers
- Voice Activity Detection (VAD)
- Silero VAD ensures efficient capture and reduces noise
- Large Language Model integration
- Mistral / OpenAI models for reasoning and dialogue
- Pluggable architecture
- Modular audio, model, and transport layers
- Test suite
- Simple import tests included
- Conversational voice assistants
- Interactive voice experiences
- Real-time chat with AI using voice
- Accessibility tools (hands-free interfaces)
- Rapid prototyping for voice-enabled agents