sunlo-core

Real-time Indic voice translation pipeline. Hindi speaker talks. Kannada speaker hears. Under 300ms.

India has 600M+ people separated by language. 139M internal migrants who can't communicate with the people they work for. 22 scheduled languages, 780+ dialects, and zero production-ready open-source infrastructure to bridge them in real time.

This repo is the core pipeline.

What it does

Audio in (any Indic language)
    ↓
Voice Activity Detection      — Silero VAD, <1ms, filters silence
    ↓
Speech Recognition            — AI4Bharat IndicWhisper (Faster-Whisper)
    ↓
Neural Machine Translation    — IndicTrans2 200M (CTranslate2, int8)
    ↓
Text-to-Speech                — AI4Bharat IndicTTS
    ↓
Audio out (target language)

Target latency: < 300ms end-to-end on GPU.

Supported languages

All 22 scheduled Indian languages via IndicTrans2:

Hindi Kannada Tamil Telugu Malayalam Bengali Gujarati Marathi Punjabi Odia Assamese Urdu Maithili Santali Kashmiri Konkani Sindhi Dogri Manipuri Bodo Sanskrit Nepali

Project structure

sunlo-core/
├── asr/
│   ├── __init__.py
│   ├── transcriber.py       # Faster-Whisper wrapper with streaming support
│   └── vad.py               # Silero VAD integration
├── translate/
│   ├── __init__.py
│   ├── translator.py        # IndicTrans2 CT2 inference
│   └── preprocess.py        # Indic NLP preprocessing, clause detection
├── tts/
│   ├── __init__.py
│   └── synthesiser.py       # IndicTTS wrapper, streaming audio output
├── server/
│   ├── __init__.py
│   ├── main.py              # FastAPI + WebSocket server
│   └── pipeline.py          # Pipeline orchestration, async queue management
├── benchmarks/
│   └── latency.py           # Per-stage latency measurement
├── tests/
├── docker/
│   └── Dockerfile
├── requirements.txt
└── README.md

Quickstart

git clone https://github.com/rohitsux/sunlo-core
cd sunlo-core
pip install -r requirements.txt

Run a single translation:

from asr import transcriber
from translate import translator
from tts import synthesiser

# Transcribe Hindi audio
text = transcriber.transcribe("audio/hindi_sample.wav", source_lang="hi")

# Translate to Kannada
translated = translator.translate(text, src="hi", tgt="kn")

# Synthesise
synthesiser.speak(translated, lang="kn", output="output.wav")

Start the WebSocket server:

uvicorn server.main:app --host 0.0.0.0 --port 8000

Models

Component	Model	Size	Latency (GPU)
ASR	AI4Bharat IndicWhisper (Faster-Whisper)	~1.5GB	~60ms
NMT	IndicTrans2 indic-indic 200M (CT2 int8)	~200MB	~40ms
TTS	AI4Bharat IndicTTS	~300MB	~80ms (first chunk)
VAD	Silero VAD	1.8MB	<1ms

All models are open source. No API keys. No usage fees.

Benchmarks

Work in progress. Will be updated as pipeline matures.

Language pair	WER (ASR)	BLEU (NMT)	End-to-end latency
Hindi → Kannada	—	—	—
Hindi → Tamil	—	—	—
Bengali → Hindi	—	—	—

Running your own benchmark:

python benchmarks/latency.py --src hi --tgt kn --audio benchmarks/samples/

Requirements

Python 3.10+
CUDA 11.8+ (for GPU inference)
4GB+ VRAM recommended
CPU inference supported (higher latency)

Roadmap

Why open source

The models doing the heavy lifting here — IndicTrans2, IndicWhisper, IndicTTS — were built by researchers at IIT Madras and AI4Bharat using public funding and public data. The infrastructure layer should be open too.

If you're building anything in the Indic language space, sunlo-core should save you weeks of plumbing work.

Contributing

Early days. If you're building in Indic voice AI and want to contribute — open an issue or reach out directly.

Credits

AI4Bharat — IndicWhisper, IndicTTS, IndicTrans2
IIT Madras NLP Lab — IndicTrans2
SYSTRAN — Faster-Whisper
Silero — VAD

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sunlo-core

What it does

Supported languages

Project structure

Quickstart

Models

Benchmarks

Requirements

Roadmap

Why open source

Contributing

Credits

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

sunlo-core

What it does

Supported languages

Project structure

Quickstart

Models

Benchmarks

Requirements

Roadmap

Why open source

Contributing

Credits

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages