Skip to content

rohitsux/sunlo-core

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

sunlo-core

Real-time Indic voice translation pipeline. Hindi speaker talks. Kannada speaker hears. Under 300ms.

India has 600M+ people separated by language. 139M internal migrants who can't communicate with the people they work for. 22 scheduled languages, 780+ dialects, and zero production-ready open-source infrastructure to bridge them in real time.

This repo is the core pipeline.


What it does

Audio in (any Indic language)
    ↓
Voice Activity Detection      — Silero VAD, <1ms, filters silence
    ↓
Speech Recognition            — AI4Bharat IndicWhisper (Faster-Whisper)
    ↓
Neural Machine Translation    — IndicTrans2 200M (CTranslate2, int8)
    ↓
Text-to-Speech                — AI4Bharat IndicTTS
    ↓
Audio out (target language)

Target latency: < 300ms end-to-end on GPU.


Supported languages

All 22 scheduled Indian languages via IndicTrans2:

Hindi Kannada Tamil Telugu Malayalam Bengali Gujarati Marathi Punjabi Odia Assamese Urdu Maithili Santali Kashmiri Konkani Sindhi Dogri Manipuri Bodo Sanskrit Nepali


Project structure

sunlo-core/
├── asr/
│   ├── __init__.py
│   ├── transcriber.py       # Faster-Whisper wrapper with streaming support
│   └── vad.py               # Silero VAD integration
├── translate/
│   ├── __init__.py
│   ├── translator.py        # IndicTrans2 CT2 inference
│   └── preprocess.py        # Indic NLP preprocessing, clause detection
├── tts/
│   ├── __init__.py
│   └── synthesiser.py       # IndicTTS wrapper, streaming audio output
├── server/
│   ├── __init__.py
│   ├── main.py              # FastAPI + WebSocket server
│   └── pipeline.py          # Pipeline orchestration, async queue management
├── benchmarks/
│   └── latency.py           # Per-stage latency measurement
├── tests/
├── docker/
│   └── Dockerfile
├── requirements.txt
└── README.md

Quickstart

git clone https://github.com/rohitsux/sunlo-core
cd sunlo-core
pip install -r requirements.txt

Run a single translation:

from asr import transcriber
from translate import translator
from tts import synthesiser

# Transcribe Hindi audio
text = transcriber.transcribe("audio/hindi_sample.wav", source_lang="hi")

# Translate to Kannada
translated = translator.translate(text, src="hi", tgt="kn")

# Synthesise
synthesiser.speak(translated, lang="kn", output="output.wav")

Start the WebSocket server:

uvicorn server.main:app --host 0.0.0.0 --port 8000

Models

Component Model Size Latency (GPU)
ASR AI4Bharat IndicWhisper (Faster-Whisper) ~1.5GB ~60ms
NMT IndicTrans2 indic-indic 200M (CT2 int8) ~200MB ~40ms
TTS AI4Bharat IndicTTS ~300MB ~80ms (first chunk)
VAD Silero VAD 1.8MB <1ms

All models are open source. No API keys. No usage fees.


Benchmarks

Work in progress. Will be updated as pipeline matures.

Language pair WER (ASR) BLEU (NMT) End-to-end latency
Hindi → Kannada
Hindi → Tamil
Bengali → Hindi

Running your own benchmark:

python benchmarks/latency.py --src hi --tgt kn --audio benchmarks/samples/

Requirements

  • Python 3.10+
  • CUDA 11.8+ (for GPU inference)
  • 4GB+ VRAM recommended
  • CPU inference supported (higher latency)

Roadmap

  • Repo structure + README
  • ASR module with streaming support
  • NMT module with CT2 inference
  • TTS module with first-chunk streaming
  • VAD integration
  • FastAPI WebSocket server
  • Docker deployment
  • Benchmark suite
  • LoRA fine-tuning on migrant domain vocabulary
  • Multi-speaker diarisation

Why open source

The models doing the heavy lifting here — IndicTrans2, IndicWhisper, IndicTTS — were built by researchers at IIT Madras and AI4Bharat using public funding and public data. The infrastructure layer should be open too.

If you're building anything in the Indic language space, sunlo-core should save you weeks of plumbing work.


Contributing

Early days. If you're building in Indic voice AI and want to contribute — open an issue or reach out directly.


Credits


License

MIT

About

Real-time Indic voice translation pipeline. Audio in (any Indic language) → Audio out (target language). <300ms latency. Built on AI4Bharat models.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors