AI Medical Consultation Platform Backend

LiveKit voice agents for simulating and monitoring doctor-patient conversations with intelligent medical information extraction.

🚀 Quick Start

Prerequisites

Python 3.11.x (specifically 3.11, not 3.12+)
LiveKit Cloud account and credentials
OpenAI API key
Virtual environment (venv)

Setup

Create and activate virtual environment:

python3.11 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

make install
# or
pip install -r requirements.txt

Configure environment variables: Create a .env file with:

LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=your-api-key
LIVEKIT_API_SECRET=your-api-secret
OPENAI_API_KEY=your-openai-key

Running the Agent

Option 1: Using the start script (Recommended)

./start_agent.sh  # Automatically stops old agents and starts new one

Option 2: Using make commands

make run   # Production mode (stops old agents first)
make dev   # Development mode with auto-reload
make stop  # Stop all running agents

Option 3: Manual start

python app.py dev

Running the Diagnosis API

The diagnosis API is required for the diagnosis panel in the frontend to work:

Start the Structured Diagnosis API

# In a separate terminal, with venv activated:
make pydantic-diagnosis-dev    # Runs on port 8000 with auto-reload

# This starts the structured JSON diagnosis service that the frontend expects

Important Notes:

Use make pydantic-diagnosis-dev NOT make diagnosis-api-dev
The pydantic-diagnosis version returns structured JSON (diagnosis, follow_up_questions, further_tests)
The diagnosis-api version returns plain text streaming and won't work with the frontend
The service runs on port 8000 by default
CORS is configured for ports 3000, 3001, 3002, 3003, and 3005

📁 Project Structure

backend/
├── app.py                      # Main agent with conversation monitoring
├── basic_agent.py             # Archived - original version without monitoring
├── diagnosis_api.py           # Plain text streaming diagnosis API (not used by frontend)
├── pydantic_diagnosis_agent.py # Structured JSON diagnosis API (use this one!)
├── medical_extractor_agent.py # (Planned) Medical information extraction agent
├── start_agent.sh             # Startup script for clean agent management
├── Makefile                   # Common commands and tasks
├── requirements.txt           # Python dependencies
├── logs/                      # Log files directory
│   ├── agent.log             # Runtime logs
│   └── conversation_*.log    # Per-session conversation transcripts
├── transcripts/              # Saved conversation transcripts (JSON)
└── tests/                    # Test files

🎯 Architecture Overview

Testing Infrastructure

The system uses an AI doctor agent to simulate real doctor-patient conversations for testing purposes:

AI Doctor: Dr. Aisha Bin Rashid (ophthalmologist persona)
Purpose: Testers act as patients to generate realistic medical conversations
Benefit: No need for real doctors during development and testing

SIP Telephony Integration (NEW ✨)

The system now supports traditional phone calls through SIP:

Inbound Calls: Receive calls on Telnyx number +18773893410
Outbound Calls: Make appointment confirmation calls
Dual Mode: Same agent handles both web and phone interactions
Language Support: English for phone, Arabic for web
See docs/SIP_SETUP.md for complete telephony setup

Core Components

1. Voice Agent (app.py)

Simulates a doctor for testing conversation monitoring
Natural conversation flow with medical expertise
Voice-first design optimized for spoken interactions

2. Conversation Monitoring

Real-time transcript capture of doctor-patient conversations
Per-session log files: conversation_{room}_{timestamp}.log
Captures both partial and final transcriptions
Foundation for feeding data to diagnostic systems

3. Medical Information Extractor (In Development)

Purpose: Intelligent extraction of medical information from conversation streams
Input: Streaming conversation events
Processing: LLM-based extraction of symptoms, complaints, medical history
Output: Structured medical information ready for diagnosis
Status: Being developed as standalone component first

4. Diagnosis API (diagnosis_api.py)

FastAPI service using Llama3-OpenBioLLM-70B model
Provides streaming diagnosis responses
Endpoint: /api/diagnosis/chat
Currently standalone, will be integrated with extractor

Voice Pipeline

Speech-to-Text: Deepgram (nova-3 model) with multilingual support
Language Model: OpenAI GPT-4 (gpt-4o-mini)
Text-to-Speech: OpenAI TTS (alloy voice)
Voice Activity Detection: Silero VAD
Turn Detection: Multilingual model

Process Management

Automatic cleanup of old agent processes
Prevents multiple agents from conflicting
Clean startup/shutdown procedures
Background process support

🚧 Development Approach

Phased Implementation

We're building this system incrementally to ensure each component works perfectly before integration:

Phase 1: ✅ Conversation Monitoring - Capture doctor-patient conversations
Phase 2: 🔄 Medical Extractor Agent - Standalone LLM agent for information extraction
Phase 3: 🔮 Integration - Connect extractor to conversation stream
Phase 4: 🔮 Diagnosis Pipeline - Connect extractor to diagnosis API

Current Phase: Medical Extractor Agent

Building as independent component first
No coupling to existing systems initially
Focus on streaming input/output capabilities
LLM-based medical information extraction

🛠️ Development

Common Commands

make lint      # Run code linting with ruff
make format    # Format code with ruff
make test      # Run tests with pytest
make clean     # Clean cache files
make stop      # Stop all running agents

# SIP Telephony Commands
make gemini-sip        # Start the enhanced SIP-enabled agent
make outbound-calls    # Make outbound appointment calls
make list-sip-trunks   # List configured SIP trunks
make setup-sip         # Initial SIP configuration setup

Debugging

Check logs/agent.log for runtime issues
Individual conversation logs in logs/conversation_*.log
Use tail -f logs/agent.log to monitor in real-time

Monitoring Conversations

Each session creates a detailed log with:

Session start/end times
User speech (real-time and final transcriptions)
Agent responses
State transitions
System events

Example log entry:

2025-07-26 13:23:58,297 - [user_transcript] user: Are you there?
2025-07-26 13:23:58,902 - [conversation] user: Are you there?
2025-07-26 13:24:06,893 - [conversation] agent: Yes, I'm here! How can I help you today?

🚨 Important Notes

Python Version: Must use Python 3.11.x (not 3.12+)
Virtual Environment: Always activate venv before running
Process Management: Use provided scripts to avoid zombie processes
Logs: Check logs directory for debugging information

🐛 Troubleshooting

Agent won't start

make stop          # Stop all agents
./start_agent.sh   # Clean start

No conversation logs appearing

Verify agent is receiving job requests in logs/agent.log
Check LiveKit credentials are correct
Ensure you're connected to the correct LiveKit project

Multiple agents running

pkill -f "python.*app.py"
pkill -f "multiprocessing.*spawn"
make run

Assignment timeout errors

This usually means another agent is already handling the room
Run make stop and try again

📄 License

See parent repository for license information.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
docs		docs
infrastructure/lambda-functions		infrastructure/lambda-functions
prompts		prompts
sip_config		sip_config
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
app.py		app.py
appointment_agent.py		appointment_agent.py
config.yaml		config.yaml
enhanced_gemini_sip_agent.py		enhanced_gemini_sip_agent.py
gemini_realtime_agent.py		gemini_realtime_agent.py
outbound_caller.py		outbound_caller.py
requirements.txt		requirements.txt
setup_sip.sh		setup_sip.sh
start_agent.sh		start_agent.sh

Folders and files

Latest commit

History

Repository files navigation

AI Medical Consultation Platform Backend

🚀 Quick Start

Prerequisites

Setup

Running the Agent

Option 1: Using the start script (Recommended)

Option 2: Using make commands

Option 3: Manual start

Running the Diagnosis API

Start the Structured Diagnosis API

📁 Project Structure

🎯 Architecture Overview

Testing Infrastructure

SIP Telephony Integration (NEW ✨)

Core Components

1. Voice Agent (app.py)

2. Conversation Monitoring

3. Medical Information Extractor (In Development)

4. Diagnosis API (diagnosis_api.py)

Voice Pipeline

Process Management

🚧 Development Approach

Phased Implementation

Current Phase: Medical Extractor Agent

🛠️ Development

Common Commands

Debugging

Monitoring Conversations

🚨 Important Notes

🐛 Troubleshooting

Agent won't start

No conversation logs appearing

Multiple agents running

Assignment timeout errors

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages