LiveKit voice agents for simulating and monitoring doctor-patient conversations with intelligent medical information extraction.
- Python 3.11.x (specifically 3.11, not 3.12+)
- LiveKit Cloud account and credentials
- OpenAI API key
- Virtual environment (venv)
- Create and activate virtual environment:
python3.11 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
make install
# or
pip install -r requirements.txt- Configure environment variables:
Create a
.envfile with:
LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=your-api-key
LIVEKIT_API_SECRET=your-api-secret
OPENAI_API_KEY=your-openai-key./start_agent.sh # Automatically stops old agents and starts new onemake run # Production mode (stops old agents first)
make dev # Development mode with auto-reload
make stop # Stop all running agentspython app.py devThe diagnosis API is required for the diagnosis panel in the frontend to work:
# In a separate terminal, with venv activated:
make pydantic-diagnosis-dev # Runs on port 8000 with auto-reload
# This starts the structured JSON diagnosis service that the frontend expectsImportant Notes:
- Use
make pydantic-diagnosis-devNOTmake diagnosis-api-dev - The
pydantic-diagnosisversion returns structured JSON (diagnosis, follow_up_questions, further_tests) - The
diagnosis-apiversion returns plain text streaming and won't work with the frontend - The service runs on port 8000 by default
- CORS is configured for ports 3000, 3001, 3002, 3003, and 3005
backend/
โโโ app.py # Main agent with conversation monitoring
โโโ basic_agent.py # Archived - original version without monitoring
โโโ diagnosis_api.py # Plain text streaming diagnosis API (not used by frontend)
โโโ pydantic_diagnosis_agent.py # Structured JSON diagnosis API (use this one!)
โโโ medical_extractor_agent.py # (Planned) Medical information extraction agent
โโโ start_agent.sh # Startup script for clean agent management
โโโ Makefile # Common commands and tasks
โโโ requirements.txt # Python dependencies
โโโ logs/ # Log files directory
โ โโโ agent.log # Runtime logs
โ โโโ conversation_*.log # Per-session conversation transcripts
โโโ transcripts/ # Saved conversation transcripts (JSON)
โโโ tests/ # Test files
The system uses an AI doctor agent to simulate real doctor-patient conversations for testing purposes:
- AI Doctor: Dr. Aisha Bin Rashid (ophthalmologist persona)
- Purpose: Testers act as patients to generate realistic medical conversations
- Benefit: No need for real doctors during development and testing
The system now supports traditional phone calls through SIP:
- Inbound Calls: Receive calls on Telnyx number +18773893410
- Outbound Calls: Make appointment confirmation calls
- Dual Mode: Same agent handles both web and phone interactions
- Language Support: English for phone, Arabic for web
- See
docs/SIP_SETUP.mdfor complete telephony setup
- Simulates a doctor for testing conversation monitoring
- Natural conversation flow with medical expertise
- Voice-first design optimized for spoken interactions
- Real-time transcript capture of doctor-patient conversations
- Per-session log files:
conversation_{room}_{timestamp}.log - Captures both partial and final transcriptions
- Foundation for feeding data to diagnostic systems
- Purpose: Intelligent extraction of medical information from conversation streams
- Input: Streaming conversation events
- Processing: LLM-based extraction of symptoms, complaints, medical history
- Output: Structured medical information ready for diagnosis
- Status: Being developed as standalone component first
- FastAPI service using Llama3-OpenBioLLM-70B model
- Provides streaming diagnosis responses
- Endpoint:
/api/diagnosis/chat - Currently standalone, will be integrated with extractor
- Speech-to-Text: Deepgram (nova-3 model) with multilingual support
- Language Model: OpenAI GPT-4 (gpt-4o-mini)
- Text-to-Speech: OpenAI TTS (alloy voice)
- Voice Activity Detection: Silero VAD
- Turn Detection: Multilingual model
- Automatic cleanup of old agent processes
- Prevents multiple agents from conflicting
- Clean startup/shutdown procedures
- Background process support
We're building this system incrementally to ensure each component works perfectly before integration:
- Phase 1: โ Conversation Monitoring - Capture doctor-patient conversations
- Phase 2: ๐ Medical Extractor Agent - Standalone LLM agent for information extraction
- Phase 3: ๐ฎ Integration - Connect extractor to conversation stream
- Phase 4: ๐ฎ Diagnosis Pipeline - Connect extractor to diagnosis API
- Building as independent component first
- No coupling to existing systems initially
- Focus on streaming input/output capabilities
- LLM-based medical information extraction
make lint # Run code linting with ruff
make format # Format code with ruff
make test # Run tests with pytest
make clean # Clean cache files
make stop # Stop all running agents
# SIP Telephony Commands
make gemini-sip # Start the enhanced SIP-enabled agent
make outbound-calls # Make outbound appointment calls
make list-sip-trunks # List configured SIP trunks
make setup-sip # Initial SIP configuration setup- Check
logs/agent.logfor runtime issues - Individual conversation logs in
logs/conversation_*.log - Use
tail -f logs/agent.logto monitor in real-time
Each session creates a detailed log with:
- Session start/end times
- User speech (real-time and final transcriptions)
- Agent responses
- State transitions
- System events
Example log entry:
2025-07-26 13:23:58,297 - [user_transcript] user: Are you there?
2025-07-26 13:23:58,902 - [conversation] user: Are you there?
2025-07-26 13:24:06,893 - [conversation] agent: Yes, I'm here! How can I help you today?
- Python Version: Must use Python 3.11.x (not 3.12+)
- Virtual Environment: Always activate venv before running
- Process Management: Use provided scripts to avoid zombie processes
- Logs: Check logs directory for debugging information
make stop # Stop all agents
./start_agent.sh # Clean start- Verify agent is receiving job requests in
logs/agent.log - Check LiveKit credentials are correct
- Ensure you're connected to the correct LiveKit project
pkill -f "python.*app.py"
pkill -f "multiprocessing.*spawn"
make run- This usually means another agent is already handling the room
- Run
make stopand try again
See parent repository for license information.