Skip to content

sundar911/bhAI_voicebot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

157 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bhAI Voice Bot

A friendly voice assistant for Tiny Miracles employees, providing compassionate support for HR, helpdesk, and production queries in Hindi/Marathi.

Overview

bhAI is a voice-first assistant that:

  • Understands Hindi, Marathi, and Hinglish (code-mixed) speech
  • Answers questions about salary, leave, benefits, and workplace policies
  • Responds in natural, warm Hindi voice
  • Escalates sensitive issues to the human impact team

Architecture

bhAI is an agent with two layers: a reactive one that answers each incoming voice note, and a proactive one that reaches out on its own a few times a day. Both share one encrypted store, one LLM (Sonnet 4.6), and one voice (Sarvam TTS).

flowchart TD
    subgraph REACTIVE["Reactive layer — answers a voice note"]
        direction TB
        V["Voice note in"] --> STT["STT · Sarvam saaras"]
        STT --> RT{{"Router · one Sonnet call<br/>picks 1–3 KB files + a use-case tag"}}
        RT --> LLM{{"Main LLM · Sonnet 4.6<br/>adaptive thinking + web_search"}}
        LLM --> POST["Parse out · strip blocks · persist memory/threads<br/>send image / map / number as separate messages<br/>escalate → Gmail email"]
        POST --> TTS["TTS · Sarvam bulbul"] --> VO["Voice note out"]
    end

    subgraph PROACTIVE["Proactive layer — sends a nudge · 3 slots/day"]
        direction TB
        SLOT["Slot fires (~6am / 1pm / 10pm)<br/>active-user + throttle gating"] --> DOS["Build dossier<br/>memory + open threads + past nudges & reactions"]
        DOS --> THINK{{"ProactiveThinker"}}
        THINK -->|"morning / night"| CHK["Light check-in (no tools)"]
        THINK -->|"afternoon"| SUB["brainstorm → critique → tools → draft → judge"]
        CHK --> NUD["Nudge (voice + optional artifact)"]
        SUB --> NUD
    end

    NUD --> POST
    CORE[("Shared core: encrypted SQLite · Sonnet · Sarvam TTS · Gmail escalation")]
    LLM -.uses.-> CORE
    THINK -.uses.-> CORE
Loading

See ARCHITECTURE.md for the full end-to-end pipeline documentation.

Quick Start

Prerequisites

  • Python 3.10+
  • UV for dependency management
  • ffmpeg for audio processing
  • API keys for Sarvam AI (required), plus OpenAI or Anthropic if using those LLM backends

Installation

# Clone the repository
git clone https://github.com/sundar911/bhAI_voicebot.git
cd bhAI_voice_bot

# Install dependencies
uv sync

# Copy and configure environment
cp .env.example .env
# Edit .env with your API keys

Configuration

Create a .env file with:

# LLM Backend: "sarvam", "openai", or "claude" (pilot default)
LLM_BACKEND=claude

# Sarvam AI (required for STT/TTS)
SARVAM_API_KEY=...
SARVAM_STT_MODEL=saaras:v3
SARVAM_TTS_MODEL=bulbul:v3
SARVAM_TTS_VOICE=suhani

# Telegram bot (entry point — replaces Twilio/WhatsApp)
TELEGRAM_BOT_TOKEN=...
TELEGRAM_WEBHOOK_SECRET=...

# Claude (default LLM for pilot)
ANTHROPIC_API_KEY=sk-ant-...

# OpenAI (only needed when LLM_BACKEND=openai)
# OPENAI_API_KEY=sk-...

# Encryption (required for conversation memory)
BHAI_ENCRYPTION_KEY=...

# Admin/dashboard auth (default: bhai-pilot-2026)
DASHBOARD_SECRET=...

# Escalation emails to impact team (Gmail API — Railway blocks SMTP)
# When ESCALATE: true fires in an LLM response, an email goes out.
# See ARCHITECTURE.md §8 for the per-category routing logic.
GMAIL_CLIENT_ID=...
GMAIL_CLIENT_SECRET=...
GMAIL_REFRESH_TOKEN=...
GMAIL_SENDER_EMAIL=...
ESCALATION_RECIPIENTS=rishikesh@...           # mental_health / unknown-category TO
ESCALATION_RECIPIENTS_DOCS_BC=priti@...        # BC docs (also loan-hardship TO)
ESCALATION_RECIPIENTS_DOCS_MIDC=dinesh@...     # MIDC docs
ESCALATION_RECIPIENTS_WORKPLACE=simran@...     # workplace grievance (HR)
ESCALATION_IMPACT_HEAD=anu@...                 # CC'd on impact-team categories
ESCALATION_CC=sundar@...                       # operator — CC'd on every escalation
ESCALATION_ENABLED=true

# Proactive nudges (off by default — master kill switch)
NUDGE_ENABLED=true
NUDGE_PHONES=*                                 # * = all active users, or comma-separated hashes

# Multimodal & web (reactive bot). Any of NANOBANANA/GEMINI/GOOGLE_GENAI/GOOGLE_API key works for image gen.
WEB_SEARCH_ENABLED=true
GOOGLE_API_KEY=...                             # image generation (Gemini) + web search

See .env.example for all available options.

Run Demo

# Process a single audio file
uv run python inference/scripts/run_demo.py --audio path/to/audio.m4a

# Skip TTS output
uv run python inference/scripts/run_demo.py --audio path/to/audio.m4a --no_tts

Project Structure

bhAI_voice_bot/
├── src/bhai/              # Core library
│   ├── stt/               # Speech-to-text backends (7 models)
│   ├── tts/               # Text-to-speech (Sarvam bulbul:v3, ElevenLabs)
│   ├── llm/               # Language model backends (Sarvam, OpenAI, Claude)
│   │   ├── prompts/       # Persona prompt + per-use-case blocks (use_cases/)
│   │   ├── llm_router.py  # Sonnet 4.6 KB + use-case classifier (was haiku_router.py)
│   │   └── kb_router.py   # Keyword fallback router
│   ├── proactive/         # Brainstorm→critique→tools→draft→judge agent for nudges
│   ├── escalations/       # ESCALATE: true → Gmail API → impact team
│   ├── pipelines/         # Processing pipelines (base + hr_admin)
│   ├── memory/            # Encrypted store, summarizer, self-edited memory
│   ├── resilience/        # FAQ cache (legacy), retry, worker (Twilio-era)
│   ├── security/          # Encryption (Fernet), webhook auth, rate limiting
│   └── integrations/      # Telegram, Twilio (legacy), SharePoint, email_client
│
├── src/tests/             # Test suite (567 tests, incl. test_contracts.py + test_proactive_*)
│
├── knowledge_base/        # Domain knowledge (editable by TM team)
│   ├── shared/            # Cross-domain (escalation, style)
│   ├── hr_admin/          # HR-specific policies
│   ├── helpdesk/          # Govt docs + schemes (~27 markdown files, Excel source)
│   └── users/             # Per-user profiles (gitignored — see ARCHITECTURE.md §13)
│
├── data/                  # Audio data and transcriptions
│   ├── sharepoint_sync/   # Auto-synced audio from SharePoint
│   └── transcription_dataset/  # Ground truth transcriptions
│
├── benchmarking/          # STT model evaluation
│   ├── scripts/           # Benchmark runners and analysis
│   ├── configs/           # Model registry (models.yaml)
│   └── results/           # Comparison CSVs, significance reports
│
├── inference/             # Production inference
│   ├── scripts/           # CLI tools
│   ├── web/               # Dev web chat UI (localhost:8002)
│   └── webhooks/          # Telegram bot entry + nudges loop
│       ├── telegram_webhook.py  # Active entry point
│       ├── nudges.py            # 3 daily proactive nudges (morning/night check-ins + afternoon utility)
│       └── twilio_webhook.py    # Legacy (Twilio era; not used)
│
├── scripts/               # Utility scripts (SharePoint sync, cleanup, profiles)
│
└── .github/workflows/     # CI (tests, black, isort, mypy)

For Tiny Miracles Team

Editing the Knowledge Base

The knowledge_base/ folder contains all the information bhAI uses to answer questions. You edit this using Claude Code (connected to this GitHub repo).

Just tell Claude Code what to change. For example:

  • "Update the leave policy in knowledge_base/hr_admin/policies.md"
  • "Add helpdesk info about Aadhaar card help"

Claude Code will make the edit, create a branch, and push it. Sundar reviews and approves.

See knowledge_base/README.md for writing guidelines and file structure.

Contributing

See CONTRIBUTING.md for guidelines on contributing to this project.

Development

Running Tests

# Run all tests
uv run pytest

# Run with coverage
uv run pytest --cov=src/bhai

CI/CD

GitHub Actions runs on every push/PR to main and dev:

  • test: pytest + black + isort
  • lint: mypy type checking

STT Benchmarking

# Compare all 7 models across all domains
python3 benchmarking/scripts/compare_models.py

# Statistical significance report
python3 benchmarking/scripts/statistical_significance.py

# Error analysis waterfall
python3 benchmarking/scripts/error_analysis.py --domain helpdesk

See benchmarking/BENCHMARKING.md for full methodology and results.

Telegram Bot (production entry point)

# Start the Telegram webhook server locally
uv run uvicorn inference.webhooks.telegram_webhook:app --host 0.0.0.0 --port 8001

# Register the webhook with Telegram (production deploy uses Railway's public URL)
# Pass the X-Telegram-Bot-Api-Secret-Token via TELEGRAM_WEBHOOK_SECRET in .env

The bot replaces the old Twilio/WhatsApp integration. See ARCHITECTURE.md §1 for the request flow.

Dev Web Chat

# Full voice pipeline in-browser (mic → STT → LLM → TTS → playback)
uv run python inference/web/chat_server.py
# Open http://127.0.0.1:8002

Tech Stack

  • STT: Sarvam AI (saaras:v3) — statistically validated as best across 7 models on 175 Hindi recordings
  • LLM: Claude Sonnet (pilot default), Sarvam (sarvam-105b), or OpenAI (gpt-4o-mini) — configurable via LLM_BACKEND
  • TTS: Sarvam AI (bulbul:v3, suhani voice — auto-detects script and switches between Hindi/Marathi/Tamil/Telugu/Bengali/Punjabi/Gujarati/Kannada/Malayalam/Odia per call) or ElevenLabs (voice cloning)
  • Messaging: Telegram bot (replaces Twilio/WhatsApp)
  • Security: Fernet encryption for PII at rest, Telegram secret-token webhook auth
  • Framework: Python, FastAPI, pydub
  • KB retrieval: Claude Sonnet 4.6 routes each query to 1-3 helpdesk files + emits a use-case tag (grievance / finance_advice / scheme_kb / general) for prompt scoping; helpdesk KB is injected only on scheme_kb turns (see ARCHITECTURE.md §5-6)
  • Escalation: ESCALATE: true from the LLM triggers a Gmail-API email to the impact team, routed per category (docs → Priti for BC / Dinesh for MIDC / Anu when unknown; workplace grievance → Simran in HR; serious mental-health → Rishi+Anu; loan hardship → Priti+Anu). Sundar is CC'd on all; Anu (impact head) is CC'd on impact-team categories.
  • Multimodal & web: the reactive bot can run Anthropic web_search for current facts and generate images on request (Gemini gemini-2.5-flash-image), and emits Google-Maps link blocks for locations — all stripped from the TTS voice and delivered as separate Telegram messages
  • Proactive agent: a brainstorm→critique→tools→draft→judge loop (src/bhai/proactive/) generates the 3 daily nudges; morning/night are light check-ins, afternoon is the substantive utility output
  • Memory: per-user encrypted SQLite. Background summarizer + Letta-style self-edited memory (LLM emits <memory> / <thread> blocks)
  • Anti-confabulation: regex backstop on every LLM response detects past-tense / unconsented future-tense outreach claims and re-prompts the model
  • Deployment: Railway (auto-deploys from main, uv pinned via railpack.json). Data persists on a volume mounted at /app/data — see ARCHITECTURE.md §13.

License

[Add license information]

Support

For issues or questions, contact the development team or open a GitHub issue.

About

Helpdesk voicebot delivered on Telegram that delivers information regarding registration/updation/deletion of identity documents and availing govt schemes. Additionally, the bot is designed to help users navigate financial, health, and every day planning.

Topics

Resources

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages