Real-time Indic voice translation pipeline. Hindi speaker talks. Kannada speaker hears. Under 300ms.
India has 600M+ people separated by language. 139M internal migrants who can't communicate with the people they work for. 22 scheduled languages, 780+ dialects, and zero production-ready open-source infrastructure to bridge them in real time.
This repo is the core pipeline.
Audio in (any Indic language)
↓
Voice Activity Detection — Silero VAD, <1ms, filters silence
↓
Speech Recognition — AI4Bharat IndicWhisper (Faster-Whisper)
↓
Neural Machine Translation — IndicTrans2 200M (CTranslate2, int8)
↓
Text-to-Speech — AI4Bharat IndicTTS
↓
Audio out (target language)
Target latency: < 300ms end-to-end on GPU.
All 22 scheduled Indian languages via IndicTrans2:
Hindi Kannada Tamil Telugu Malayalam Bengali Gujarati Marathi Punjabi Odia Assamese Urdu Maithili Santali Kashmiri Konkani Sindhi Dogri Manipuri Bodo Sanskrit Nepali
sunlo-core/
├── asr/
│ ├── __init__.py
│ ├── transcriber.py # Faster-Whisper wrapper with streaming support
│ └── vad.py # Silero VAD integration
├── translate/
│ ├── __init__.py
│ ├── translator.py # IndicTrans2 CT2 inference
│ └── preprocess.py # Indic NLP preprocessing, clause detection
├── tts/
│ ├── __init__.py
│ └── synthesiser.py # IndicTTS wrapper, streaming audio output
├── server/
│ ├── __init__.py
│ ├── main.py # FastAPI + WebSocket server
│ └── pipeline.py # Pipeline orchestration, async queue management
├── benchmarks/
│ └── latency.py # Per-stage latency measurement
├── tests/
├── docker/
│ └── Dockerfile
├── requirements.txt
└── README.md
git clone https://github.com/rohitsux/sunlo-core
cd sunlo-core
pip install -r requirements.txtRun a single translation:
from asr import transcriber
from translate import translator
from tts import synthesiser
# Transcribe Hindi audio
text = transcriber.transcribe("audio/hindi_sample.wav", source_lang="hi")
# Translate to Kannada
translated = translator.translate(text, src="hi", tgt="kn")
# Synthesise
synthesiser.speak(translated, lang="kn", output="output.wav")Start the WebSocket server:
uvicorn server.main:app --host 0.0.0.0 --port 8000| Component | Model | Size | Latency (GPU) |
|---|---|---|---|
| ASR | AI4Bharat IndicWhisper (Faster-Whisper) | ~1.5GB | ~60ms |
| NMT | IndicTrans2 indic-indic 200M (CT2 int8) | ~200MB | ~40ms |
| TTS | AI4Bharat IndicTTS | ~300MB | ~80ms (first chunk) |
| VAD | Silero VAD | 1.8MB | <1ms |
All models are open source. No API keys. No usage fees.
Work in progress. Will be updated as pipeline matures.
| Language pair | WER (ASR) | BLEU (NMT) | End-to-end latency |
|---|---|---|---|
| Hindi → Kannada | — | — | — |
| Hindi → Tamil | — | — | — |
| Bengali → Hindi | — | — | — |
Running your own benchmark:
python benchmarks/latency.py --src hi --tgt kn --audio benchmarks/samples/- Python 3.10+
- CUDA 11.8+ (for GPU inference)
- 4GB+ VRAM recommended
- CPU inference supported (higher latency)
- Repo structure + README
- ASR module with streaming support
- NMT module with CT2 inference
- TTS module with first-chunk streaming
- VAD integration
- FastAPI WebSocket server
- Docker deployment
- Benchmark suite
- LoRA fine-tuning on migrant domain vocabulary
- Multi-speaker diarisation
The models doing the heavy lifting here — IndicTrans2, IndicWhisper, IndicTTS — were built by researchers at IIT Madras and AI4Bharat using public funding and public data. The infrastructure layer should be open too.
If you're building anything in the Indic language space, sunlo-core should save you weeks of plumbing work.
Early days. If you're building in Indic voice AI and want to contribute — open an issue or reach out directly.
- AI4Bharat — IndicWhisper, IndicTTS, IndicTrans2
- IIT Madras NLP Lab — IndicTrans2
- SYSTRAN — Faster-Whisper
- Silero — VAD
MIT