Skip to content

dbhowmick/jobline

Repository files navigation

Jobline

AI Voice Agents that work

Jobline is a proof-of-concept (POC) application for AI voice agents built with Phoenix 1.8 LiveView and Elixir 1.15+. The application provides a real-time voice-based conversational interface with continuous conversation mode, automatic voice activity detection, AI-powered responses, and streaming text-to-speech synthesis.

Features

Current Implementation

  • Continuous Conversation Mode: Natural, telephony-like conversation experience
  • Voice Activity Detection (VAD): Automatic speech detection with server-side VAD (Cartesia)
  • Real-time Speech-to-Text: Streaming transcription with Cartesia Ink-Whisper
  • AI Conversation: Powered by OpenAI for intelligent responses
  • Streaming Text-to-Speech: Real-time audio synthesis with Cartesia Sonic 3
  • Full-Duplex Interruption: Interrupt AI mid-response with new input
  • Real-time Audio Capture: Browser-based microphone access with AudioWorklet
  • Low-Latency Streaming: Sub-1-second perceived latency with 500ms PCM chunks
  • Audio Visualization: Live frequency spectrum with conversation state colors
  • Conversation History: Persistent message history with database storage
  • LiveView Real-time UI: Instant state updates without page reloads
  • Responsive Design: Tailwind CSS v4 styling with dark theme
  • No FFmpeg Required: Pure browser + Elixir solution, simplified deployment

Tech Stack

  • Backend: Elixir 1.15+, Phoenix 1.8, Phoenix LiveView
  • Database: PostgreSQL with Ecto
  • Frontend: JavaScript (ES2022), esbuild
  • Styling: Tailwind CSS v4 (no config file, @import syntax)
  • HTTP Client: Req library for AI service integration
  • Audio: Web Audio API, MediaRecorder API

Getting Started

Prerequisites

  • Elixir 1.15 or later
  • Erlang/OTP 26 or later
  • PostgreSQL 14 or later
  • Node.js 18 or later (for asset compilation)
  • Modern browser with AudioWorklet support (Chrome 66+, Firefox 76+, Safari 14.1+, Edge 79+)

Installation

  1. Clone the repository:

    git clone <repository-url>
    cd jobline
  2. Configure environment variables:

    cp .env.example .env

    Edit .env and add your API keys:

  3. Install dependencies and setup the database:

    mix setup

    This command will:

    • Install Elixir dependencies
    • Create the database
    • Run migrations
    • Install and build assets (Tailwind, esbuild)
  4. Start the Phoenix server:

    mix phx.server

    Or start with IEx console:

    iex -S mix phx.server
  5. Visit localhost:4000 in your browser

Development

Common Commands

# Setup (first time)
mix setup                          # Install deps, create DB, run migrations, setup assets

# Development
mix phx.server                     # Start Phoenix server
iex -S mix phx.server             # Start server with IEx console

# Testing
mix test                          # Run all tests
mix test test/path/to/file.exs    # Run specific test file
mix test --failed                 # Run previously failed tests

# Database
mix ecto.create                   # Create database
mix ecto.migrate                  # Run migrations
mix ecto.reset                    # Drop, create, migrate, and seed
mix ecto.gen.migration name       # Generate new migration

# Assets
mix assets.setup                  # Install Tailwind and esbuild
mix assets.build                  # Build assets for development
mix assets.deploy                 # Build minified assets for production

# Code Quality
mix precommit                     # Run compile, format, and test

Pre-commit Workflow

Always run before committing:

mix precommit

This ensures code is compiled without warnings, properly formatted, and all tests pass.

Configuration

Audio Chunk Interval

The real-time audio streaming interval can be configured in config/config.exs:

config :jobline,
  audio_chunk_interval: 500  # milliseconds (Phase 2: 500ms for real-time)

Phase 2 uses 500ms chunks for sub-1-second latency:

  • 500ms (default): Optimal balance of latency and network efficiency
  • 200-400ms: Lower latency, more frequent network requests
  • 600-1000ms: Slightly higher latency, fewer requests

Note: Values below 200ms may overwhelm the STT service; values above 1000ms reduce real-time feel.

Architecture

Audio Processing Pipeline (Phase 2)

User speaks → Microphone → getUserMedia
                                ↓
                    MediaStreamSource (Web Audio API)
                                ↓
                    AudioContext (16kHz sample rate)
                                ↓
                    AudioWorkletNode (pcm-processor)
                    ├─ Float32 → Int16 PCM conversion
                    └─ Buffer to 500ms chunks
                                ↓
                    Main Thread (port.postMessage)
                                ↓
                    Base64 encode + metadata
                                ↓
                    Phoenix Channel (every 500ms)
                                ↓
                    TalkLive (LiveView)
                                ↓
                    SessionWorker (GenServer)
                    └─ Direct PCM passthrough (no FFmpeg!)
                                ↓
                    Cartesia WebSocket (binary PCM)
                                ↓
                    Ink-Whisper STT (streaming)
                                ↓
                    Interim + Final Transcripts
                                ↓
                    LiveView → Browser Display

Key Components

Frontend (assets/js/hooks/talk_hooks.js)

  • TalkRecorder Hook: Manages audio recording and streaming
    • Captures microphone input with Web Audio API
    • Streams audio chunks at configurable intervals
    • Implements retry logic with exponential backoff
    • Tracks chunk sequence and session metadata

Backend (lib/jobline_web/live/talk_live.ex)

  • TalkLive: LiveView module handling real-time interactions
    • audio_chunk event: Receives streaming audio chunks
    • recording_complete event: Signals end of recording session
    • chunk_send_failed event: Handles failed chunk transmissions

Data Flow

  1. Recording Start: User clicks microphone button
  2. Chunk Streaming: Audio chunks sent every 15 seconds with metadata:
    • sequence: Chunk number in current session
    • timestamp: When chunk was captured
    • bytes: Chunk size
    • chunk: Base64-encoded audio data
  3. Recording Stop: User releases button
  4. Completion Signal: recording_complete event with session summary
  5. [Future] Processing: STT → AI → TTS pipeline

Browser Compatibility

Tested and supported on:

  • ✅ Chrome/Chromium 66+ (recommended)
  • ✅ Firefox 76+
  • ✅ Edge 79+
  • ✅ Safari 14.1+ (macOS/iOS)

Requires:

  • AudioWorklet API support (for real-time PCM conversion)
  • Web Audio API support
  • Microphone permissions

Note: Phase 2 requires modern browsers with AudioWorklet support. Legacy browsers are not supported.

Project Structure

jobline/
├── assets/
│   ├── js/
│   │   ├── app.js              # Main JavaScript entry point
│   │   ├── pcm-processor.js    # AudioWorklet processor for PCM conversion
│   │   └── hooks/
│   │       └── talk_hooks.js   # Real-time audio streaming with Web Audio API
│   └── css/
│       └── app.css             # Tailwind CSS styles
├── lib/
│   ├── jobline/
│   │   └── stt/
│   │       ├── session_worker.ex      # STT session GenServer (simplified)
│   │       └── cartesia_websocket.ex  # Cartesia WebSocket client
│   └── jobline_web/
│       ├── live/
│       │   └── talk_live.ex    # Main voice interface LiveView
│       └── router.ex           # Route definitions
├── config/
│   └── config.exs              # Application configuration
├── priv/
│   └── repo/
│       └── migrations/         # Database migrations
└── test/                       # Test files

Development Phases

Phase 1: Audio Infrastructure ✅ Complete

  • ✅ Browser-based audio capture and streaming
  • ✅ AudioWorklet for PCM conversion
  • ✅ Real-time chunk streaming
  • ✅ Core UI and visualization

Phase 2: STT Integration ✅ Complete

  • ✅ Cartesia Ink-Whisper integration
  • ✅ Real-time streaming transcription
  • ✅ Interim and final transcript handling
  • ✅ Sub-1-second latency

Phase 3: AI & TTS Integration ✅ Complete

  • ✅ OpenAI conversation integration
  • ✅ Cartesia Sonic 3 TTS streaming
  • ✅ Real-time audio playback
  • ✅ Conversation history persistence

Phase 4: Continuous Conversation Mode ✅ Complete

  • ✅ Server-side Voice Activity Detection (VAD)
  • ✅ Automatic turn-taking
  • ✅ Conversation timeout management
  • ✅ Full-duplex interruption support
  • ✅ Telephony-like natural conversation experience

Troubleshooting

Microphone Access Issues

  • Ensure HTTPS or localhost (browsers require secure context)
  • Check browser permissions in settings
  • Look for microphone icon in address bar

Audio Not Recording

  • Verify microphone is connected and working
  • Check browser console for errors
  • Ensure MediaRecorder API is supported

Chunks Not Sending

  • Check Phoenix server logs for errors
  • Verify network connection
  • Look for retry attempts in browser console

Contributing

This is a POC project. Before contributing:

  1. Read the guidelines in CLAUDE.md and AGENTS.md
  2. Run mix precommit before committing
  3. Follow existing code patterns and conventions

License

[Specify your license here]

Resources

Phoenix & Elixir

Audio APIs

AI Services (Planned)

About

POC for AI voice agents for Service Industry

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors