Open-source audio intelligence.
Documentation · HuggingFace Models · Blog
📖 English · 中文 · 日本語 · 한국어 · Español · Deutsch · Français · हिन्दी · Português · Русский
speech-swift — AI speech models for Apple Silicon. ASR, TTS, speech-to-speech, VAD, diarization, and speech enhancement — all running locally via MLX and CoreML. No cloud, no API keys.
speech-android — On-device speech SDK for Android. ASR, TTS, VAD, and noise cancellation via ONNX Runtime with Qualcomm NNAPI acceleration.
speech-core — Cross-platform voice agent pipeline engine in C++. Turn detection, interruption handling, speech queuing, and protocol handling.
soniqo.audio covers setup, usage, and architecture for all three SDKs:
- Getting Started — Installation via Homebrew, SPM, and Gradle
- Guides — Per-model walkthroughs: Qwen3-ASR, Parakeet TDT, Qwen3-TTS, CosyVoice, Kokoro, PersonaPlex, VAD, diarization, denoising, and more
- CLI Reference — All commands and flags
- API & Protocols — Shared Swift protocols and types
- Architecture — Module structure, backends, weight formats, and memory tables
- Benchmarks — RTF, latency, WER, and memory across devices
Integrating on-device speech into your app, need support, or want your model to be supported?