qts

On-device Qwen3 TTS in Rust: the speech model runs in ggml (GGUF weights), and the vocoder runs in ONNX Runtime. No server required—everything stays on your machine.

If you want to…	Start here
Turn text into a WAV file	Quick start → Synthesize
Match a reference voice (speaker / style)	Voice clone prompts
Try it interactively in the terminal	Interactive TUI
Embed the engine in your own Rust app	Use the `qts` library crate (see Crates)
Tune GPU / CPU backends	Runtime configuration

Quick start

You need: Rust, CMake on your PATH, and Git (for the ggml submodule).

Clone and fetch ggml

git clone https://github.com/yet-another-ai/qts.git
cd qts
git submodule update --init --recursive

Build the CLI (first build compiles vendored ggml; it can take a few minutes)
```
cargo build --release -p qts_cli
```
Download model files — this repo does not ship weights. Grab a main GGUF plus the shared vocoder ONNX from Hugging Face (or export your own; see docs/models.md) and put them in one folder, for example:
```
models/
  qwen3-tts-0.6b-f16.gguf    # or another supported q4_k / q5_k / q6_k / q8_0 variant
  qwen3-tts-vocoder.onnx
```
Those names match the default lookup used by --model-dir (see ModelPaths).

Synthesize

cargo run --release -p qts_cli -- synthesize \
  --model-dir models \
  --text "Hello from local TTS." \
  --out hello.wav

On Apple Silicon, default features include Metal and CoreML where applicable. On Linux / Windows, the default build also enables the NVIDIA-oriented vocoder EPs cuda, nvrtx, and tensorrt; DirectML remains available via an extra feature flag on Windows (see Build options).

Repository layout

Path	What it is
`crates/`	Rust: GGML bindings, TTS engine (`qts`), CLI/TUI (`qts_cli`)
`scripts/`	Python (`uv`): export GGUF/ONNX and voice-clone protobuf prompts
`docs/`	Models, testing, releases, Hugging Face card template
`testdata/`	Small fixtures only; keep large checkpoints outside the repo

Crates

Crate	Role
`qts_ggml_sys`	CMake + bindgen FFI to vendored ggml (submodule)
`qts_ggml`	Thin wrappers + `sys` re-export
`qts`	Library: GGUF load, tokenizer, transformer inference, speaker encoding, vocoder bridge, protobuf voice-clone types
`qts_cli`	`synthesize`, `profile`, and interactive `tui`

Build options

CLI (same engine as the library):

cargo build -p qts_cli
cargo build -p qts_cli --features metal    # Apple GPU (GGML)
cargo build -p qts_cli --features vulkan   # Vulkan (GGML); needs SDK + `glslc` where applicable
cargo build -p qts_cli --features tensorrt # NVIDIA TensorRT vocoder
cargo build -p qts_cli --features directml # Windows vocoder (ONNX DirectML)
cargo build -p qts_cli --features cuda     # NVIDIA vocoder (ONNX CUDA)

Library-only examples:

cargo build -p qts --features metal
cargo build -p qts --features vulkan

GPU features are declared on qts_ggml_sys / qts; details and version pins live in VERSIONS.md. For the vocoder, qts and qts_cli forward the native ONNX Runtime EP feature set directly, including acl, armnn, azure, cann, coreml, cuda, directml, migraphx, nnapi, nvrtx, onednn, openvino, qnn, rknpu, tensorrt, tvm, vitis, webgpu, and xnnpack. The default feature set now includes cuda, nvrtx, and tensorrt in addition to the existing GGML defaults.

ONNX Runtime build note: ort does not ship prebuilt binaries for every EP combination. Its documented prebuilt bundles cover platform-native EPs like directml, xnnpack, and coreml, plus separate bundles for cuda + tensorrt, webgpu, and nvrtx. If you enable a mixed combination outside those bundles, ort may fall back to downloading a CPU-only runtime unless you compile ONNX Runtime from source. In practice, if you want a single build with cuda, nvrtx, and tensorrt all available together, plan on a source-built ORT.

Runtime behavior: with GPU features enabled, auto prefers Metal on Apple and Vulkan on other platforms, then falls back to CPU if init fails. Builds without those features use CPU only for GGML.

Full workspace:

cargo build --workspace
cargo test --workspace

Python helpers (`uv`)

Export and prompt tooling live under scripts/:

uv sync
uv run export-model-artifacts --help
uv run export-voice-clone-prompt --help

qts ships its protobuf schema in crates/qts/proto/. Regenerate the checked-in Python stub with uv run generate-voice-clone-prompt-pb2 after schema changes.

Models

Where to download, how to export, and layout options: docs/models.md.

Default files in one directory (used by --model-dir):

qwen3-tts-vocoder.onnx
One of: qwen3-tts-0.6b-f16.gguf, qwen3-tts-0.6b-q8_0.gguf, … (see ModelPaths for the full preference order)

Maintainers: two repos, one workflow

Repo	Role
GitHub `yet-another-ai/qts`	Source of truth for code, export scripts, tests, docs
Hugging Face `dsh0416/Qwen3-TTS-12Hz-0.6B-Base-QTS`	Published GGUF + ONNX artifacts

Typical flow: change and export from a pinned commit here → upload only binaries to Hugging Face → keep the HF model card in sync with this repo’s docs (template: docs/huggingface-model-card.md).

Release packaging helper:

cargo xtask hf-release --model Qwen/qts-12Hz-0.6B-Base

Add --hf-repo-dir /path/to/cloned-hf-repo to sync into an existing clone. CI (.github/workflows/) builds release binaries and can publish tagged releases; see workflow comments for HF_TOKEN and related setup.

Using the CLI

Synthesize text to WAV

cargo run --release -p qts_cli -- synthesize \
  --model-dir models \
  --text "Your line here." \
  --out out.wav

Useful knobs include --threads, --frames (max audio frames), --temperature, --top-p, --top-k, --language-id, and --chunk-size (see --help on the binary). Backend overrides: --backend, --vocoder-ep, plus fallback chains. --vocoder-ep accepts auto or any enabled native ORT EP token such as coreml, directml, cuda, openvino, tensorrt, or xnnpack.

Voice clone prompts

To stay aligned with upstream Qwen3 TTS, conditioning uses protobuf prompts (exported from Python), not raw reference audio at synthesis time.

Modes:

xvector-only — speaker identity from the reference clip.
ICL — identity plus reference text and codec prompt (closer to upstream create_voice_clone_prompt).

xvector-only example

uv sync

uv run export-voice-clone-prompt \
  --model Qwen/qts-12Hz-0.6B-Base \
  --ref-audio testdata/hello.wav \
  --x-vector-only-mode \
  --out target/hello.xvector.voice-clone-prompt.pb

cargo run --release -p qts_cli -- synthesize \
  --model-dir models \
  --text "hello" \
  --voice-clone-prompt target/hello.xvector.voice-clone-prompt.pb \
  --out target/hello-from-xvector.wav

ICL example

uv run export-voice-clone-prompt \
  --model Qwen/qts-12Hz-0.6B-Base \
  --ref-audio testdata/hello.wav \
  --ref-text "hello" \
  --out target/hello.voice-clone-prompt.pb

cargo run --release -p qts_cli -- synthesize \
  --model-dir models \
  --text "hello" \
  --voice-clone-prompt target/hello.voice-clone-prompt.pb \
  --out target/hello-from-icl.wav

The engine reads fields such as ref_spk_embedding, ref_code, ref_text, and the icl_mode / x_vector_only_mode flags. Legacy wrapper:

uv run python scripts/export_voice_clone_prompt.py --help

Interactive TUI

Loads once, then you type lines and hear audio via cpal.

cargo run --release -p qts_cli -- tui \
  --model-dir models \
  --voice-clone-prompt target/hello.xvector.voice-clone-prompt.pb \
  --language en \
  --chunk-size 4

Key / input	Action
`Enter`	Synthesize current line
`F2`	Cycle English / Chinese / Japanese
`Esc`, `Ctrl-C`, or `:q`	Quit

The header shows the active transformer backend and vocoder execution provider. --language en|zh|ja is a friendly alias; --language-id still sets the raw codec id. --chunk-size trades startup latency vs scheduling overhead (codec frames per playback chunk).

Apple (CoreML vocoder example)

cargo run --release -p qts_cli -- tui \
  --model-dir models \
  --backend auto \
  --backend-fallback metal,vulkan,cpu \
  --vocoder-ep coreml \
  --chunk-size 4

Windows (DirectML vocoder example)

cargo run --release -p qts_cli --no-default-features --features vulkan,directml -- tui \
  --model-dir models \
  --backend auto \
  --backend-fallback vulkan,cpu \
  --vocoder-ep directml \
  --chunk-size 4

Default auto chains

Platform	Transformer	Vocoder
Apple	`metal,vulkan,cpu`	`coreml,cpu`
Windows	`vulkan,cpu`	`cuda,nvrtx,tensorrt,directml,cpu`
Linux / Other	`vulkan,cpu`	`cuda,nvrtx,tensorrt,cpu`

Runtime configuration

Concern	CLI flags	Environment variables
GGML backend	`--backend`, `--backend-fallback`	`QWEN3_TTS_BACKEND`, `QWEN3_TTS_BACKEND_FALLBACK`
ONNX vocoder EP	`--vocoder-ep`, `--vocoder-ep-fallback`	`QWEN3_TTS_VOCODER_EP`, `QWEN3_TTS_VOCODER_EP_FALLBACK`
Experimental talker KV cache	`--talker-kv-mode f16	turboquant`
Multi-GPU adapter index	—	`QWEN3_TTS_GPU_DEVICE` (default `0`; e.g. `Vulkan0`, `MTL0`)

When using cargo run -p qts_cli directly, Cargo features (e.g. --features vulkan or --features cuda) must include the backend / execution provider you select with QWEN3_TTS_BACKEND or QWEN3_TTS_VOCODER_EP, or init will fail. The vocoder accepts the native ORT EP tokens cpu, acl, armnn, azure, cann, coreml, cuda, directml, migraphx, nnapi, nvrtx, onednn, openvino, qnn, rknpu, tensorrt, tvm, vitis, webgpu, and xnnpack when the matching feature is enabled.

Profiling: cargo xtask profile runs the CLI with matching features and sets QWEN3_TTS_BACKEND for you (important for Vulkan on macOS). Example:

cargo xtask profile cpu --model-dir models --text "hello" --frames 64 --runs 3
cargo xtask profile metal --model-dir models --text "hello" --frames 64

Manual equivalent:

QWEN3_TTS_BACKEND=vulkan cargo run --release -p qts_cli --features vulkan -- profile \
  --text "hello" --model-dir models --frames 64

profile prints per-stage timings; --out run1.wav keeps audio from the first run.

Experimental note: --talker-kv-mode turboquant switches the talker KV cache to a quantized GGML-backed storage path. The cache itself now lives on the selected backend, while host-side quantization and upload are still part of the write-back path. profile reports talker KV allocation plus kv_download, kv_quantize, and kv_upload timing buckets.

Tests and benchmarks

Fast tests: cargo test --workspace (no large downloads).
Optional integration tests (real checkpoints): set QWEN3_TTS_MODEL_DIR — see docs/testing.md.

Benchmarks (needs QWEN3_TTS_BENCH_MODEL_DIR, etc.):

cargo xtask bench cpu
cargo xtask bench metal
cargo xtask bench vulkan

Set QWEN3_TTS_BENCH_TALKER_KV_MODE=turboquant to compare the experimental talker KV cache against the default f16 path.

Alias definition: .cargo/config.toml.

License

Apache License 2.0 — see LICENSE and NOTICE.

Godot / gdext

qts is a normal Rust rlib. A Godot extension can depend on it from a gdext crate without a separate C ABI, unless you choose to add one.

Acknowledgments

predict-woo/qwen3-tts.cpp for architecture and tensor naming.
QwenLM/Qwen3-TTS for the model and naming conventions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

qts

Quick start

Repository layout

Crates

Build options

Python helpers (`uv`)

Models

Maintainers: two repos, one workflow

Using the CLI

Synthesize text to WAV

Voice clone prompts

Interactive TUI

Runtime configuration

Tests and benchmarks

License

Godot / gdext

Acknowledgments

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
.cargo		.cargo
.github		.github
crates		crates
docs		docs
scripts		scripts
testdata		testdata
xtask		xtask
.gitignore		.gitignore
.gitmodules		.gitmodules
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
VERSIONS.md		VERSIONS.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

qts

Quick start

Repository layout

Crates

Build options

Python helpers (uv)

Models

Maintainers: two repos, one workflow

Using the CLI

Synthesize text to WAV

Voice clone prompts

Interactive TUI

Runtime configuration

Tests and benchmarks

License

Godot / gdext

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Python helpers (`uv`)

Packages