AI Voice Agents that work
Jobline is a proof-of-concept (POC) application for AI voice agents built with Phoenix 1.8 LiveView and Elixir 1.15+. The application provides a real-time voice-based conversational interface with continuous conversation mode, automatic voice activity detection, AI-powered responses, and streaming text-to-speech synthesis.
- ✅ Continuous Conversation Mode: Natural, telephony-like conversation experience
- ✅ Voice Activity Detection (VAD): Automatic speech detection with server-side VAD (Cartesia)
- ✅ Real-time Speech-to-Text: Streaming transcription with Cartesia Ink-Whisper
- ✅ AI Conversation: Powered by OpenAI for intelligent responses
- ✅ Streaming Text-to-Speech: Real-time audio synthesis with Cartesia Sonic 3
- ✅ Full-Duplex Interruption: Interrupt AI mid-response with new input
- ✅ Real-time Audio Capture: Browser-based microphone access with AudioWorklet
- ✅ Low-Latency Streaming: Sub-1-second perceived latency with 500ms PCM chunks
- ✅ Audio Visualization: Live frequency spectrum with conversation state colors
- ✅ Conversation History: Persistent message history with database storage
- ✅ LiveView Real-time UI: Instant state updates without page reloads
- ✅ Responsive Design: Tailwind CSS v4 styling with dark theme
- ✅ No FFmpeg Required: Pure browser + Elixir solution, simplified deployment
- Backend: Elixir 1.15+, Phoenix 1.8, Phoenix LiveView
- Database: PostgreSQL with Ecto
- Frontend: JavaScript (ES2022), esbuild
- Styling: Tailwind CSS v4 (no config file,
@importsyntax) - HTTP Client: Req library for AI service integration
- Audio: Web Audio API, MediaRecorder API
- Elixir 1.15 or later
- Erlang/OTP 26 or later
- PostgreSQL 14 or later
- Node.js 18 or later (for asset compilation)
- Modern browser with AudioWorklet support (Chrome 66+, Firefox 76+, Safari 14.1+, Edge 79+)
-
Clone the repository:
git clone <repository-url> cd jobline
-
Configure environment variables:
cp .env.example .env
Edit
.envand add your API keys:CARTESIA_API_KEY: Get your API key from Cartesia Console
-
Install dependencies and setup the database:
mix setup
This command will:
- Install Elixir dependencies
- Create the database
- Run migrations
- Install and build assets (Tailwind, esbuild)
-
Start the Phoenix server:
mix phx.server
Or start with IEx console:
iex -S mix phx.server
-
Visit
localhost:4000in your browser
# Setup (first time)
mix setup # Install deps, create DB, run migrations, setup assets
# Development
mix phx.server # Start Phoenix server
iex -S mix phx.server # Start server with IEx console
# Testing
mix test # Run all tests
mix test test/path/to/file.exs # Run specific test file
mix test --failed # Run previously failed tests
# Database
mix ecto.create # Create database
mix ecto.migrate # Run migrations
mix ecto.reset # Drop, create, migrate, and seed
mix ecto.gen.migration name # Generate new migration
# Assets
mix assets.setup # Install Tailwind and esbuild
mix assets.build # Build assets for development
mix assets.deploy # Build minified assets for production
# Code Quality
mix precommit # Run compile, format, and testAlways run before committing:
mix precommitThis ensures code is compiled without warnings, properly formatted, and all tests pass.
The real-time audio streaming interval can be configured in config/config.exs:
config :jobline,
audio_chunk_interval: 500 # milliseconds (Phase 2: 500ms for real-time)Phase 2 uses 500ms chunks for sub-1-second latency:
- 500ms (default): Optimal balance of latency and network efficiency
- 200-400ms: Lower latency, more frequent network requests
- 600-1000ms: Slightly higher latency, fewer requests
Note: Values below 200ms may overwhelm the STT service; values above 1000ms reduce real-time feel.
User speaks → Microphone → getUserMedia
↓
MediaStreamSource (Web Audio API)
↓
AudioContext (16kHz sample rate)
↓
AudioWorkletNode (pcm-processor)
├─ Float32 → Int16 PCM conversion
└─ Buffer to 500ms chunks
↓
Main Thread (port.postMessage)
↓
Base64 encode + metadata
↓
Phoenix Channel (every 500ms)
↓
TalkLive (LiveView)
↓
SessionWorker (GenServer)
└─ Direct PCM passthrough (no FFmpeg!)
↓
Cartesia WebSocket (binary PCM)
↓
Ink-Whisper STT (streaming)
↓
Interim + Final Transcripts
↓
LiveView → Browser Display
- TalkRecorder Hook: Manages audio recording and streaming
- Captures microphone input with Web Audio API
- Streams audio chunks at configurable intervals
- Implements retry logic with exponential backoff
- Tracks chunk sequence and session metadata
- TalkLive: LiveView module handling real-time interactions
audio_chunkevent: Receives streaming audio chunksrecording_completeevent: Signals end of recording sessionchunk_send_failedevent: Handles failed chunk transmissions
- Recording Start: User clicks microphone button
- Chunk Streaming: Audio chunks sent every 15 seconds with metadata:
sequence: Chunk number in current sessiontimestamp: When chunk was capturedbytes: Chunk sizechunk: Base64-encoded audio data
- Recording Stop: User releases button
- Completion Signal:
recording_completeevent with session summary - [Future] Processing: STT → AI → TTS pipeline
Tested and supported on:
- ✅ Chrome/Chromium 66+ (recommended)
- ✅ Firefox 76+
- ✅ Edge 79+
- ✅ Safari 14.1+ (macOS/iOS)
Requires:
- AudioWorklet API support (for real-time PCM conversion)
- Web Audio API support
- Microphone permissions
Note: Phase 2 requires modern browsers with AudioWorklet support. Legacy browsers are not supported.
jobline/
├── assets/
│ ├── js/
│ │ ├── app.js # Main JavaScript entry point
│ │ ├── pcm-processor.js # AudioWorklet processor for PCM conversion
│ │ └── hooks/
│ │ └── talk_hooks.js # Real-time audio streaming with Web Audio API
│ └── css/
│ └── app.css # Tailwind CSS styles
├── lib/
│ ├── jobline/
│ │ └── stt/
│ │ ├── session_worker.ex # STT session GenServer (simplified)
│ │ └── cartesia_websocket.ex # Cartesia WebSocket client
│ └── jobline_web/
│ ├── live/
│ │ └── talk_live.ex # Main voice interface LiveView
│ └── router.ex # Route definitions
├── config/
│ └── config.exs # Application configuration
├── priv/
│ └── repo/
│ └── migrations/ # Database migrations
└── test/ # Test files
- ✅ Browser-based audio capture and streaming
- ✅ AudioWorklet for PCM conversion
- ✅ Real-time chunk streaming
- ✅ Core UI and visualization
- ✅ Cartesia Ink-Whisper integration
- ✅ Real-time streaming transcription
- ✅ Interim and final transcript handling
- ✅ Sub-1-second latency
- ✅ OpenAI conversation integration
- ✅ Cartesia Sonic 3 TTS streaming
- ✅ Real-time audio playback
- ✅ Conversation history persistence
- ✅ Server-side Voice Activity Detection (VAD)
- ✅ Automatic turn-taking
- ✅ Conversation timeout management
- ✅ Full-duplex interruption support
- ✅ Telephony-like natural conversation experience
- Ensure HTTPS or localhost (browsers require secure context)
- Check browser permissions in settings
- Look for microphone icon in address bar
- Verify microphone is connected and working
- Check browser console for errors
- Ensure MediaRecorder API is supported
- Check Phoenix server logs for errors
- Verify network connection
- Look for retry attempts in browser console
This is a POC project. Before contributing:
- Read the guidelines in
CLAUDE.mdandAGENTS.md - Run
mix precommitbefore committing - Follow existing code patterns and conventions
[Specify your license here]