KittenTTS C Library

A native C implementation of KittenTTS — a lightweight, CPU-optimized text-to-speech engine built on ONNX Runtime. Produces 24 kHz float32 audio from text using pre-downloaded model files.

This library is a C port of the official Python implementation at KittenML/KittenTTS.

Requirements

Dependency	Version	Purpose
CMake	≥ 3.18	Build system
ONNX Runtime	≥ 1.16	Neural inference
libsndfile	any	WAV file output
libpcre2-8	any	Text normalization regex
espeak-ng	any	Phonemization (via subprocess)

All except ONNX Runtime are found automatically via pkg-config. ONNX Runtime must be pointed to explicitly (see below).

Building

# Download ONNX Runtime for your platform from:
#   https://github.com/microsoft/onnxruntime/releases
# Or install via package manager (e.g. Homebrew on macOS):
#   brew install onnxruntime

cd c/
mkdir build && cd build
cmake .. -DONNXRUNTIME_ROOT=/path/to/onnxruntime
cmake --build . --parallel

This produces:

libkittentts.dylib / libkittentts.so — shared library
libkittentts.a — static library
kittentts-cli — command-line tool

To install system-wide:

cmake --install . --prefix /usr/local

Getting Model Files

Models are distributed via Hugging Face Hub. Download manually or use the Python package to cache them:

# Using Python (one-time download):
pip install kittentts
python -c "from kittentts import KittenTTS; KittenTTS('KittenML/kitten-tts-nano-0.8')"

After download, locate the files:

~/.cache/huggingface/hub/models--KittenML--kitten-tts-nano-0.8/snapshots/<hash>/
  kitten_tts_nano_v0_8.onnx   ← model file
  voices.npz                  ← voice embeddings

Available model variants:

Model	Size	Parameters
`kitten-tts-nano-0.8`	~15 MB	15M
`kitten-tts-nano-int8-0.8`	~25 MB	15M quantized
`kitten-tts-micro-0.8`	~40 MB	40M
`kitten-tts-mini-0.8`	~80 MB	80M

CLI Usage

kittentts-cli [options] "Text to speak"

Options:
  --model    PATH   Path to the .onnx model file (required)
  --voices   PATH   Path to the voices .npz file (required)
  --output   PATH   Output WAV file (default: output.wav)
  --voice    NAME   Voice name (default: expr-voice-5-m)
  --speed    FLOAT  Speech speed multiplier (default: 1.0)
  --backend  NAME   Execution backend: cpu|cuda|amd_gpu (default: auto)
  --no-clean        Disable text normalization
  --list-voices     List available voices and exit
  --help            Show this help

Examples

Basic usage:

kittentts-cli \
  --model nano.onnx \
  --voices voices.npz \
  --output hello.wav \
  "Hello, world."

List available voices:

kittentts-cli --model nano.onnx --voices voices.npz --list-voices

Custom voice and speed:

kittentts-cli \
  --model nano.onnx \
  --voices voices.npz \
  --voice expr-voice-2-f \
  --speed 1.2 \
  --output fast.wav \
  "The quick brown fox jumps over the lazy dog."

Raw phoneme input (skip text normalization):

kittentts-cli --model nano.onnx --voices voices.npz --no-clean \
  --output raw.wav "She sells sea shells by the sea shore."

GPU inference (requires onnxruntime-gpu):

kittentts-cli --model nano.onnx --voices voices.npz \
  --backend cuda --output gpu.wav "Testing GPU synthesis."

Library API

Include <kittentts.h> and link with -lkittentts.

Lifecycle

// Create engine from pre-downloaded files.
// backend: NULL = auto, "cpu", "cuda", "amd_gpu"
KittenTTS *tts = kittentts_create("nano.onnx", "voices.npz", NULL);
if (!tts) {
    fprintf(stderr, "Error: %s\n", kittentts_last_error());
    return 1;
}

// Always destroy when done.
kittentts_destroy(tts);

Batch synthesis

size_t n_samples;
float *audio = kittentts_generate(tts,
    "Hello, world.",   // text (UTF-8)
    "expr-voice-5-m",  // voice name
    1.0f,              // speed
    1,                 // clean_text: normalize numbers, currency, etc.
    &n_samples);

if (!audio) {
    fprintf(stderr, "Error: %s\n", kittentts_last_error());
} else {
    // audio is float32 at 24 kHz, n_samples long
    // ... use audio ...
    kittentts_free_audio(audio);
}

Write directly to WAV

int rc = kittentts_generate_to_file(tts,
    "Hello, world.",
    "output.wav",
    "expr-voice-5-m",
    1.0f,    // speed
    24000,   // sample rate
    1);      // clean_text

Streaming synthesis

Useful for long texts or low-latency playback pipelines — the callback fires once per sentence chunk:

void on_chunk(const float *chunk, size_t n_samples, void *userdata) {
    // Stream chunk to audio device, append to buffer, etc.
    // chunk is valid only for the duration of this call.
    fwrite(chunk, sizeof(float), n_samples, (FILE *)userdata);
}

FILE *out = fopen("stream.raw", "wb");
int rc = kittentts_generate_stream(tts,
    "Long text spanning many sentences...",
    "expr-voice-5-m",
    1.0f,      // speed
    1,         // clean_text
    on_chunk,
    out);
fclose(out);

List available voices

int count;
const char **voices = kittentts_available_voices(tts, &count);
for (int i = 0; i < count; i++)
    printf("%s\n", voices[i]);
// Pointers are valid for the lifetime of tts; do not free.

Error handling

// kittentts_last_error() is thread-local and valid until the next API call
// on the same thread.
const char *err = kittentts_last_error();

Text Normalization

When clean_text is enabled (the default), the preprocessor converts spoken-friendly forms before phonemization:

Input	Output
`$1,200.50`	"one thousand two hundred dollars and fifty cents"
`March 21st`	"March twenty-first"
`3:45 PM`	"three forty-five PM"
`100km/h`	"one hundred kilometers per hour"
`1.5e-3`	"one point five times ten to the power of negative three"
`IV`	"four" (Roman numerals)
`I'm`	"I am" (contractions)
`192.168.1.1`	"one nine two dot one six eight dot one dot one"

Pass clean_text=0 / --no-clean if your input is already normalized or phonetic.

Audio Output Format

Sample rate: 24,000 Hz
Channels: 1 (mono)
Sample format: IEEE float32
WAV files use the SF_FORMAT_WAV | SF_FORMAT_FLOAT libsndfile encoding

To convert to 16-bit PCM WAV for broader compatibility:

ffmpeg -i output.wav -acodec pcm_s16le output_16bit.wav

Licensing

KittenTTS C is licensed under the Apache License 2.0 (see LICENSE).

The project includes or links against several third-party components with their own licenses:

miniz (vendored): MIT License — see THIRD_PARTY_LICENSES.md
ONNX Runtime (linked): MIT License
libsndfile (linked): LGPL 2.1
libpcre2-8 (linked): BSD 3-Clause License
espeak-ng (subprocess): GPL 3.0 (optional runtime dependency, not linked)

See THIRD_PARTY_LICENSES.md for detailed compliance information and distribution guidance.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
cli		cli
include		include
src		src
vendor		vendor
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
DESIGN.md		DESIGN.md
LICENSE		LICENSE
README.md		README.md
THIRD_PARTY_LICENSES.md		THIRD_PARTY_LICENSES.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KittenTTS C Library

Requirements

Building

Getting Model Files

CLI Usage

Examples

Library API

Lifecycle

Batch synthesis

Write directly to WAV

Streaming synthesis

List available voices

Error handling

Text Normalization

Audio Output Format

Licensing

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

KittenTTS C Library

Requirements

Building

Getting Model Files

CLI Usage

Examples

Library API

Lifecycle

Batch synthesis

Write directly to WAV

Streaming synthesis

List available voices

Error handling

Text Normalization

Audio Output Format

Licensing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages