Jobline

AI Voice Agents that work

Jobline is a proof-of-concept (POC) application for AI voice agents built with Phoenix 1.8 LiveView and Elixir 1.15+. The application provides a real-time voice-based conversational interface with continuous conversation mode, automatic voice activity detection, AI-powered responses, and streaming text-to-speech synthesis.

Features

Current Implementation

✅ Continuous Conversation Mode: Natural, telephony-like conversation experience
✅ Voice Activity Detection (VAD): Automatic speech detection with server-side VAD (Cartesia)
✅ Real-time Speech-to-Text: Streaming transcription with Cartesia Ink-Whisper
✅ AI Conversation: Powered by OpenAI for intelligent responses
✅ Streaming Text-to-Speech: Real-time audio synthesis with Cartesia Sonic 3
✅ Full-Duplex Interruption: Interrupt AI mid-response with new input
✅ Real-time Audio Capture: Browser-based microphone access with AudioWorklet
✅ Low-Latency Streaming: Sub-1-second perceived latency with 500ms PCM chunks
✅ Audio Visualization: Live frequency spectrum with conversation state colors
✅ Conversation History: Persistent message history with database storage
✅ LiveView Real-time UI: Instant state updates without page reloads
✅ Responsive Design: Tailwind CSS v4 styling with dark theme
✅ No FFmpeg Required: Pure browser + Elixir solution, simplified deployment

Tech Stack

Backend: Elixir 1.15+, Phoenix 1.8, Phoenix LiveView
Database: PostgreSQL with Ecto
Frontend: JavaScript (ES2022), esbuild
Styling: Tailwind CSS v4 (no config file, @import syntax)
HTTP Client: Req library for AI service integration
Audio: Web Audio API, MediaRecorder API

Getting Started

Prerequisites

Elixir 1.15 or later
Erlang/OTP 26 or later
PostgreSQL 14 or later
Node.js 18 or later (for asset compilation)
Modern browser with AudioWorklet support (Chrome 66+, Firefox 76+, Safari 14.1+, Edge 79+)

Installation

Clone the repository:
```
git clone <repository-url>
cd jobline
```
Configure environment variables:
```
cp .env.example .env
```
Edit .env and add your API keys:
- CARTESIA_API_KEY: Get your API key from Cartesia Console
Install dependencies and setup the database:
```
mix setup
```
This command will:
- Install Elixir dependencies
- Create the database
- Run migrations
- Install and build assets (Tailwind, esbuild)
Start the Phoenix server:
```
mix phx.server
```
Or start with IEx console:
```
iex -S mix phx.server
```
Visit localhost:4000 in your browser

Development

Common Commands

# Setup (first time)
mix setup                          # Install deps, create DB, run migrations, setup assets

# Development
mix phx.server                     # Start Phoenix server
iex -S mix phx.server             # Start server with IEx console

# Testing
mix test                          # Run all tests
mix test test/path/to/file.exs    # Run specific test file
mix test --failed                 # Run previously failed tests

# Database
mix ecto.create                   # Create database
mix ecto.migrate                  # Run migrations
mix ecto.reset                    # Drop, create, migrate, and seed
mix ecto.gen.migration name       # Generate new migration

# Assets
mix assets.setup                  # Install Tailwind and esbuild
mix assets.build                  # Build assets for development
mix assets.deploy                 # Build minified assets for production

# Code Quality
mix precommit                     # Run compile, format, and test

Pre-commit Workflow

Always run before committing:

mix precommit

This ensures code is compiled without warnings, properly formatted, and all tests pass.

Configuration

Audio Chunk Interval

The real-time audio streaming interval can be configured in config/config.exs:

config :jobline,
  audio_chunk_interval: 500  # milliseconds (Phase 2: 500ms for real-time)

Phase 2 uses 500ms chunks for sub-1-second latency:

500ms (default): Optimal balance of latency and network efficiency
200-400ms: Lower latency, more frequent network requests
600-1000ms: Slightly higher latency, fewer requests

Note: Values below 200ms may overwhelm the STT service; values above 1000ms reduce real-time feel.

Architecture

Audio Processing Pipeline (Phase 2)

User speaks → Microphone → getUserMedia
                                ↓
                    MediaStreamSource (Web Audio API)
                                ↓
                    AudioContext (16kHz sample rate)
                                ↓
                    AudioWorkletNode (pcm-processor)
                    ├─ Float32 → Int16 PCM conversion
                    └─ Buffer to 500ms chunks
                                ↓
                    Main Thread (port.postMessage)
                                ↓
                    Base64 encode + metadata
                                ↓
                    Phoenix Channel (every 500ms)
                                ↓
                    TalkLive (LiveView)
                                ↓
                    SessionWorker (GenServer)
                    └─ Direct PCM passthrough (no FFmpeg!)
                                ↓
                    Cartesia WebSocket (binary PCM)
                                ↓
                    Ink-Whisper STT (streaming)
                                ↓
                    Interim + Final Transcripts
                                ↓
                    LiveView → Browser Display

Key Components

Frontend (`assets/js/hooks/talk_hooks.js`)

TalkRecorder Hook: Manages audio recording and streaming
- Captures microphone input with Web Audio API
- Streams audio chunks at configurable intervals
- Implements retry logic with exponential backoff
- Tracks chunk sequence and session metadata

Backend (`lib/jobline_web/live/talk_live.ex`)

TalkLive: LiveView module handling real-time interactions
- audio_chunk event: Receives streaming audio chunks
- recording_complete event: Signals end of recording session
- chunk_send_failed event: Handles failed chunk transmissions

Data Flow

Recording Start: User clicks microphone button
Chunk Streaming: Audio chunks sent every 15 seconds with metadata:
- sequence: Chunk number in current session
- timestamp: When chunk was captured
- bytes: Chunk size
- chunk: Base64-encoded audio data
Recording Stop: User releases button
Completion Signal: recording_complete event with session summary
[Future] Processing: STT → AI → TTS pipeline

Browser Compatibility

Tested and supported on:

✅ Chrome/Chromium 66+ (recommended)
✅ Firefox 76+
✅ Edge 79+
✅ Safari 14.1+ (macOS/iOS)

Requires:

AudioWorklet API support (for real-time PCM conversion)
Web Audio API support
Microphone permissions

Note: Phase 2 requires modern browsers with AudioWorklet support. Legacy browsers are not supported.

Project Structure

jobline/
├── assets/
│   ├── js/
│   │   ├── app.js              # Main JavaScript entry point
│   │   ├── pcm-processor.js    # AudioWorklet processor for PCM conversion
│   │   └── hooks/
│   │       └── talk_hooks.js   # Real-time audio streaming with Web Audio API
│   └── css/
│       └── app.css             # Tailwind CSS styles
├── lib/
│   ├── jobline/
│   │   └── stt/
│   │       ├── session_worker.ex      # STT session GenServer (simplified)
│   │       └── cartesia_websocket.ex  # Cartesia WebSocket client
│   └── jobline_web/
│       ├── live/
│       │   └── talk_live.ex    # Main voice interface LiveView
│       └── router.ex           # Route definitions
├── config/
│   └── config.exs              # Application configuration
├── priv/
│   └── repo/
│       └── migrations/         # Database migrations
└── test/                       # Test files

Development Phases

Phase 1: Audio Infrastructure ✅ Complete

✅ Browser-based audio capture and streaming
✅ AudioWorklet for PCM conversion
✅ Real-time chunk streaming
✅ Core UI and visualization

Phase 2: STT Integration ✅ Complete

✅ Cartesia Ink-Whisper integration
✅ Real-time streaming transcription
✅ Interim and final transcript handling
✅ Sub-1-second latency

Phase 3: AI & TTS Integration ✅ Complete

✅ OpenAI conversation integration
✅ Cartesia Sonic 3 TTS streaming
✅ Real-time audio playback
✅ Conversation history persistence

Phase 4: Continuous Conversation Mode ✅ Complete

✅ Server-side Voice Activity Detection (VAD)
✅ Automatic turn-taking
✅ Conversation timeout management
✅ Full-duplex interruption support
✅ Telephony-like natural conversation experience

Troubleshooting

Microphone Access Issues

Ensure HTTPS or localhost (browsers require secure context)
Check browser permissions in settings
Look for microphone icon in address bar

Audio Not Recording

Verify microphone is connected and working
Check browser console for errors
Ensure MediaRecorder API is supported

Chunks Not Sending

Check Phoenix server logs for errors
Verify network connection
Look for retry attempts in browser console

Contributing

This is a POC project. Before contributing:

Read the guidelines in CLAUDE.md and AGENTS.md
Run mix precommit before committing
Follow existing code patterns and conventions

License

[Specify your license here]

Resources

Phoenix & Elixir

Audio APIs

AI Services (Planned)

Cartesia - STT & TTS
OpenAI - Conversational AI

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.claude-context		.claude-context
.claude/commands		.claude/commands
assets		assets
config		config
lib		lib
priv		priv
project		project
rel/overlays/bin		rel/overlays/bin
test		test
.dockerignore		.dockerignore
.env.example		.env.example
.formatter.exs		.formatter.exs
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
README.md		README.md
docker-compose.local.yml		docker-compose.local.yml
docker-compose.yml		docker-compose.yml
mix.exs		mix.exs
mix.lock		mix.lock

Folders and files

Latest commit

History

Repository files navigation

Jobline

Features

Current Implementation

Tech Stack

Getting Started

Prerequisites

Installation

Development

Common Commands

Pre-commit Workflow

Configuration

Audio Chunk Interval

Architecture

Audio Processing Pipeline (Phase 2)

Key Components

Frontend (assets/js/hooks/talk_hooks.js)

Backend (lib/jobline_web/live/talk_live.ex)

Data Flow

Browser Compatibility

Project Structure

Development Phases

Phase 1: Audio Infrastructure ✅ Complete

Phase 2: STT Integration ✅ Complete

Phase 3: AI & TTS Integration ✅ Complete

Phase 4: Continuous Conversation Mode ✅ Complete

Troubleshooting

Microphone Access Issues

Audio Not Recording

Chunks Not Sending

Contributing

License

Resources

Phoenix & Elixir

Audio APIs

AI Services (Planned)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Frontend (`assets/js/hooks/talk_hooks.js`)

Backend (`lib/jobline_web/live/talk_live.ex`)

Packages