Skip to content

nicx004/voice-agent

voice-agent

Real-Time AI Voice Agent
A real-time voice agent built using LiveKit Agents, AssemblyAI for Speech-to-Text (STT), Cartesia for Text-to-Speech (TTS), Mistral / OpenAI LLMs, and Silero Voice Activity Detection (VAD).

This project enables real-time, bidirectional voice interactions with AI models by combining streaming audio processing with large language models, allowing practical use cases such as conversational assistants, voice-driven applications, and real-time natural language dialogue systems.


🧠 Features

  • Live voice processing pipeline
    • Continuous speech capture and decoding
    • Real-time agent responses
  • Speech-to-Text (STT)
    • Powered by AssemblyAI (or other ASR systems)
  • Text-to-Speech (TTS)
    • Cartesia or other configurable TTS providers
  • Voice Activity Detection (VAD)
    • Silero VAD ensures efficient capture and reduces noise
  • Large Language Model integration
    • Mistral / OpenAI models for reasoning and dialogue
  • Pluggable architecture
    • Modular audio, model, and transport layers
  • Test suite
    • Simple import tests included

🎯 Typical Use Cases

  • Conversational voice assistants
  • Interactive voice experiences
  • Real-time chat with AI using voice
  • Accessibility tools (hands-free interfaces)
  • Rapid prototyping for voice-enabled agents

About

Real-time voice agent built on LiveKit Agents with AssemblyAI STT, Cartesia TTS, Mistral/OpenAI LLM, and Silero VAD

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages