A friendly voice assistant for Tiny Miracles employees, providing compassionate support for HR, helpdesk, and production queries in Hindi/Marathi.
bhAI is a voice-first assistant that:
- Understands Hindi, Marathi, and Hinglish (code-mixed) speech
- Answers questions about salary, leave, benefits, and workplace policies
- Responds in natural, warm Hindi voice
- Escalates sensitive issues to the human impact team
bhAI is an agent with two layers: a reactive one that answers each incoming voice note, and a proactive one that reaches out on its own a few times a day. Both share one encrypted store, one LLM (Sonnet 4.6), and one voice (Sarvam TTS).
flowchart TD
subgraph REACTIVE["Reactive layer — answers a voice note"]
direction TB
V["Voice note in"] --> STT["STT · Sarvam saaras"]
STT --> RT{{"Router · one Sonnet call<br/>picks 1–3 KB files + a use-case tag"}}
RT --> LLM{{"Main LLM · Sonnet 4.6<br/>adaptive thinking + web_search"}}
LLM --> POST["Parse out · strip blocks · persist memory/threads<br/>send image / map / number as separate messages<br/>escalate → Gmail email"]
POST --> TTS["TTS · Sarvam bulbul"] --> VO["Voice note out"]
end
subgraph PROACTIVE["Proactive layer — sends a nudge · 3 slots/day"]
direction TB
SLOT["Slot fires (~6am / 1pm / 10pm)<br/>active-user + throttle gating"] --> DOS["Build dossier<br/>memory + open threads + past nudges & reactions"]
DOS --> THINK{{"ProactiveThinker"}}
THINK -->|"morning / night"| CHK["Light check-in (no tools)"]
THINK -->|"afternoon"| SUB["brainstorm → critique → tools → draft → judge"]
CHK --> NUD["Nudge (voice + optional artifact)"]
SUB --> NUD
end
NUD --> POST
CORE[("Shared core: encrypted SQLite · Sonnet · Sarvam TTS · Gmail escalation")]
LLM -.uses.-> CORE
THINK -.uses.-> CORE
See ARCHITECTURE.md for the full end-to-end pipeline documentation.
- Python 3.10+
- UV for dependency management
- ffmpeg for audio processing
- API keys for Sarvam AI (required), plus OpenAI or Anthropic if using those LLM backends
# Clone the repository
git clone https://github.com/sundar911/bhAI_voicebot.git
cd bhAI_voice_bot
# Install dependencies
uv sync
# Copy and configure environment
cp .env.example .env
# Edit .env with your API keysCreate a .env file with:
# LLM Backend: "sarvam", "openai", or "claude" (pilot default)
LLM_BACKEND=claude
# Sarvam AI (required for STT/TTS)
SARVAM_API_KEY=...
SARVAM_STT_MODEL=saaras:v3
SARVAM_TTS_MODEL=bulbul:v3
SARVAM_TTS_VOICE=suhani
# Telegram bot (entry point — replaces Twilio/WhatsApp)
TELEGRAM_BOT_TOKEN=...
TELEGRAM_WEBHOOK_SECRET=...
# Claude (default LLM for pilot)
ANTHROPIC_API_KEY=sk-ant-...
# OpenAI (only needed when LLM_BACKEND=openai)
# OPENAI_API_KEY=sk-...
# Encryption (required for conversation memory)
BHAI_ENCRYPTION_KEY=...
# Admin/dashboard auth (default: bhai-pilot-2026)
DASHBOARD_SECRET=...
# Escalation emails to impact team (Gmail API — Railway blocks SMTP)
# When ESCALATE: true fires in an LLM response, an email goes out.
# See ARCHITECTURE.md §8 for the per-category routing logic.
GMAIL_CLIENT_ID=...
GMAIL_CLIENT_SECRET=...
GMAIL_REFRESH_TOKEN=...
GMAIL_SENDER_EMAIL=...
ESCALATION_RECIPIENTS=rishikesh@... # mental_health / unknown-category TO
ESCALATION_RECIPIENTS_DOCS_BC=priti@... # BC docs (also loan-hardship TO)
ESCALATION_RECIPIENTS_DOCS_MIDC=dinesh@... # MIDC docs
ESCALATION_RECIPIENTS_WORKPLACE=simran@... # workplace grievance (HR)
ESCALATION_IMPACT_HEAD=anu@... # CC'd on impact-team categories
ESCALATION_CC=sundar@... # operator — CC'd on every escalation
ESCALATION_ENABLED=true
# Proactive nudges (off by default — master kill switch)
NUDGE_ENABLED=true
NUDGE_PHONES=* # * = all active users, or comma-separated hashes
# Multimodal & web (reactive bot). Any of NANOBANANA/GEMINI/GOOGLE_GENAI/GOOGLE_API key works for image gen.
WEB_SEARCH_ENABLED=true
GOOGLE_API_KEY=... # image generation (Gemini) + web searchSee .env.example for all available options.
# Process a single audio file
uv run python inference/scripts/run_demo.py --audio path/to/audio.m4a
# Skip TTS output
uv run python inference/scripts/run_demo.py --audio path/to/audio.m4a --no_ttsbhAI_voice_bot/
├── src/bhai/ # Core library
│ ├── stt/ # Speech-to-text backends (7 models)
│ ├── tts/ # Text-to-speech (Sarvam bulbul:v3, ElevenLabs)
│ ├── llm/ # Language model backends (Sarvam, OpenAI, Claude)
│ │ ├── prompts/ # Persona prompt + per-use-case blocks (use_cases/)
│ │ ├── llm_router.py # Sonnet 4.6 KB + use-case classifier (was haiku_router.py)
│ │ └── kb_router.py # Keyword fallback router
│ ├── proactive/ # Brainstorm→critique→tools→draft→judge agent for nudges
│ ├── escalations/ # ESCALATE: true → Gmail API → impact team
│ ├── pipelines/ # Processing pipelines (base + hr_admin)
│ ├── memory/ # Encrypted store, summarizer, self-edited memory
│ ├── resilience/ # FAQ cache (legacy), retry, worker (Twilio-era)
│ ├── security/ # Encryption (Fernet), webhook auth, rate limiting
│ └── integrations/ # Telegram, Twilio (legacy), SharePoint, email_client
│
├── src/tests/ # Test suite (567 tests, incl. test_contracts.py + test_proactive_*)
│
├── knowledge_base/ # Domain knowledge (editable by TM team)
│ ├── shared/ # Cross-domain (escalation, style)
│ ├── hr_admin/ # HR-specific policies
│ ├── helpdesk/ # Govt docs + schemes (~27 markdown files, Excel source)
│ └── users/ # Per-user profiles (gitignored — see ARCHITECTURE.md §13)
│
├── data/ # Audio data and transcriptions
│ ├── sharepoint_sync/ # Auto-synced audio from SharePoint
│ └── transcription_dataset/ # Ground truth transcriptions
│
├── benchmarking/ # STT model evaluation
│ ├── scripts/ # Benchmark runners and analysis
│ ├── configs/ # Model registry (models.yaml)
│ └── results/ # Comparison CSVs, significance reports
│
├── inference/ # Production inference
│ ├── scripts/ # CLI tools
│ ├── web/ # Dev web chat UI (localhost:8002)
│ └── webhooks/ # Telegram bot entry + nudges loop
│ ├── telegram_webhook.py # Active entry point
│ ├── nudges.py # 3 daily proactive nudges (morning/night check-ins + afternoon utility)
│ └── twilio_webhook.py # Legacy (Twilio era; not used)
│
├── scripts/ # Utility scripts (SharePoint sync, cleanup, profiles)
│
└── .github/workflows/ # CI (tests, black, isort, mypy)
The knowledge_base/ folder contains all the information bhAI uses to answer questions. You edit this using Claude Code (connected to this GitHub repo).
Just tell Claude Code what to change. For example:
- "Update the leave policy in knowledge_base/hr_admin/policies.md"
- "Add helpdesk info about Aadhaar card help"
Claude Code will make the edit, create a branch, and push it. Sundar reviews and approves.
See knowledge_base/README.md for writing guidelines and file structure.
See CONTRIBUTING.md for guidelines on contributing to this project.
# Run all tests
uv run pytest
# Run with coverage
uv run pytest --cov=src/bhaiGitHub Actions runs on every push/PR to main and dev:
- test: pytest + black + isort
- lint: mypy type checking
# Compare all 7 models across all domains
python3 benchmarking/scripts/compare_models.py
# Statistical significance report
python3 benchmarking/scripts/statistical_significance.py
# Error analysis waterfall
python3 benchmarking/scripts/error_analysis.py --domain helpdeskSee benchmarking/BENCHMARKING.md for full methodology and results.
# Start the Telegram webhook server locally
uv run uvicorn inference.webhooks.telegram_webhook:app --host 0.0.0.0 --port 8001
# Register the webhook with Telegram (production deploy uses Railway's public URL)
# Pass the X-Telegram-Bot-Api-Secret-Token via TELEGRAM_WEBHOOK_SECRET in .envThe bot replaces the old Twilio/WhatsApp integration. See ARCHITECTURE.md §1 for the request flow.
# Full voice pipeline in-browser (mic → STT → LLM → TTS → playback)
uv run python inference/web/chat_server.py
# Open http://127.0.0.1:8002- STT: Sarvam AI (saaras:v3) — statistically validated as best across 7 models on 175 Hindi recordings
- LLM: Claude Sonnet (pilot default), Sarvam (sarvam-105b), or OpenAI (gpt-4o-mini) — configurable via
LLM_BACKEND - TTS: Sarvam AI (
bulbul:v3, suhani voice — auto-detects script and switches between Hindi/Marathi/Tamil/Telugu/Bengali/Punjabi/Gujarati/Kannada/Malayalam/Odia per call) or ElevenLabs (voice cloning) - Messaging: Telegram bot (replaces Twilio/WhatsApp)
- Security: Fernet encryption for PII at rest, Telegram secret-token webhook auth
- Framework: Python, FastAPI, pydub
- KB retrieval: Claude Sonnet 4.6 routes each query to 1-3 helpdesk files + emits a use-case tag (
grievance/finance_advice/scheme_kb/general) for prompt scoping; helpdesk KB is injected only onscheme_kbturns (see ARCHITECTURE.md §5-6) - Escalation:
ESCALATE: truefrom the LLM triggers a Gmail-API email to the impact team, routed per category (docs → Priti for BC / Dinesh for MIDC / Anu when unknown; workplace grievance → Simran in HR; serious mental-health → Rishi+Anu; loan hardship → Priti+Anu). Sundar is CC'd on all; Anu (impact head) is CC'd on impact-team categories. - Multimodal & web: the reactive bot can run Anthropic
web_searchfor current facts and generate images on request (Geminigemini-2.5-flash-image), and emits Google-Maps link blocks for locations — all stripped from the TTS voice and delivered as separate Telegram messages - Proactive agent: a brainstorm→critique→tools→draft→judge loop (
src/bhai/proactive/) generates the 3 daily nudges; morning/night are light check-ins, afternoon is the substantive utility output - Memory: per-user encrypted SQLite. Background summarizer + Letta-style self-edited memory (LLM emits
<memory>/<thread>blocks) - Anti-confabulation: regex backstop on every LLM response detects past-tense / unconsented future-tense outreach claims and re-prompts the model
- Deployment: Railway (auto-deploys from
main,uvpinned viarailpack.json). Data persists on a volume mounted at/app/data— see ARCHITECTURE.md §13.
[Add license information]
For issues or questions, contact the development team or open a GitHub issue.