Skip to content

daliusd/qtts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

qtts - Qwen3-TTS Command Line Interface

A simple, powerful CLI for generating high-quality speech from text using Qwen3-TTS models locally.

Features

  • Voice Cloning: Clone any voice from a 3-second audio sample
  • Preset Voices: 9 premium voices covering multiple languages and dialects
  • Voice Design: Generate custom voices from natural language descriptions
  • Multi-language: Supports 10 major languages (Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian)
  • Emotion Control: Fine-tune tone, emotion, and speaking style with text instructions
  • High Quality: 12Hz tokenizer for natural-sounding speech
  • MP3 Export: Direct MP3 output for easy sharing

Quick Start for New Users

git clone <repo-url>
cd qtts
python3.12 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
./qtts.py "Hello world!" -s Ryan -l English

Installation

Prerequisites

  1. Python 3.12 (recommended - see Troubleshooting if using 3.13/3.14)

  2. ffmpeg (for MP3 conversion):

    # macOS
    brew install ffmpeg sox
    
    # Ubuntu/Debian
    sudo apt install ffmpeg sox
    
    # Windows (using chocolatey)
    choco install ffmpeg sox
  3. CUDA (optional, for GPU acceleration - highly recommended)

Setup

  1. Clone or download this repository:

    cd qtts
  2. Create a virtual environment with Python 3.12:

    # macOS/Linux with brew-installed python3.12
    python3.12 -m venv venv
    source venv/bin/activate
    
    # Or use conda
    conda create -n qwen3-tts python=3.12 -y
    conda activate qwen3-tts
  3. Install dependencies:

    pip install -r requirements.txt
  4. (Optional) For better performance with NVIDIA GPU only:

    # Only works on Linux/Windows with CUDA-capable GPU
    # Will NOT work on macOS or CPU-only systems
    pip install flash-attn --no-build-isolation

    Note: Flash-attention requires CUDA GPU. If installation fails, you can skip this step - the CLI will work fine without it (just a bit slower).

  5. Make the script executable:

    chmod +x qtts.py

System-Wide Installation (Optional)

To make qtts available from anywhere on your system, use the wrapper script method:

Note: A pyproject.toml is included for future pip/PyPI distribution, but pip install is not recommended due to potential llvmlite build issues on Python 3.12+. The wrapper script method below is the recommended approach.

  1. Make the wrapper script executable:

    chmod +x qtts-wrapper.sh
  2. Create a symlink in your user bin directory (no sudo required):

    ln -sf "$(pwd)/qtts-wrapper.sh" ~/.local/bin/qtts

    Note: Make sure ~/.local/bin is in your PATH. If not, add this to your ~/.bashrc or ~/.zshrc:

    export PATH="$HOME/.local/bin:$PATH"
  3. Alternatively, for system-wide installation (requires sudo):

    sudo ln -sf "$(pwd)/qtts-wrapper.sh" /usr/local/bin/qtts
  4. Verify installation:

    which qtts
    qtts --list-speakers

After installation, you can run qtts from any directory without the ./ prefix or activating the virtual environment.

Quick Start

Basic Usage (Preset Voice)

# If installed system-wide:
qtts "Hello, welcome to Qwen3-TTS!" -s Vivian -l English

# Or from the project directory:
./qtts.py "Hello, welcome to Qwen3-TTS!" -s Vivian -l English

This generates output.mp3 with the Vivian voice speaking in English.

With Emotion Control

qtts "I'm so excited to meet you!" -s Ryan -i "Very happy and energetic"

Voice Cloning

Clone a voice from a reference audio:

./qtts.py "This is my cloned voice" -m clone \
  --ref-audio path/to/reference.wav \
  --ref-text "The text spoken in the reference audio"

Voice Design

Create a unique voice from a description:

./qtts.py "Hello there!" -m design \
  -i "Young male voice, 25 years old, cheerful and confident tone"

Usage Guide

Command Structure

# If installed system-wide:
qtts [TEXT] [OPTIONS]

# Or from the project directory:
./qtts.py [TEXT] [OPTIONS]

Note: The rest of this guide uses qtts for brevity. If not installed system-wide, use ./qtts.py instead.

Core Options

Option Description Default
-o, --output Output file path output.mp3
-m, --mode Generation mode: custom, clone, or design custom
-l, --language Target language (see list below) Auto

Mode-Specific Options

Custom Voice Mode (Preset Speakers)

Option Description Default
-s, --speaker Preset speaker name Vivian
-i, --instruct Emotion/tone instruction (optional) None

Available Speakers:

Speaker Description Native Language
Vivian Bright, slightly edgy young female Chinese
Serena Warm, gentle young female Chinese
Uncle_Fu Seasoned male, low and mellow Chinese
Dylan Youthful Beijing male, clear Chinese (Beijing)
Eric Lively Chengdu male, husky Chinese (Sichuan)
Ryan Dynamic male, strong rhythm English
Aiden Sunny American male English
Ono_Anna Playful Japanese female Japanese
Sohee Warm Korean female Korean

List all speakers:

qtts --list-speakers

Clone Mode (Voice Cloning)

Option Description Required
--ref-audio Path or URL to reference audio (3+ seconds) Yes
--ref-text Transcript of the reference audio Yes

Design Mode (Voice Description)

Option Description Required
-i, --instruct Natural language voice description Yes

Advanced Options

Option Description
--model Custom model path or HuggingFace ID
--device Device: cuda:0, cpu, mps (auto-detected)
--list-languages Show all supported languages

Supported Languages

Auto, Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian

Use --list-languages to display the full list.

Examples

1. English Speech with American Voice

qtts "Welcome to the future of text-to-speech" -s Aiden -l English

2. Chinese Speech with Emotion

qtts "其实我真的有发现,我是一个特别善于观察别人情绪的人。" \
  -s Vivian -l Chinese -i "用特别愤怒的语气说"

3. Japanese Speech

qtts "こんにちは、世界!" -s Ono_Anna -l Japanese

4. Clone Voice from URL

qtts "I am solving the equation" -m clone \
  --ref-audio "https://example.com/voice.wav" \
  --ref-text "Original text from the audio" \
  -l English

5. Design a Character Voice

qtts "Prepare for trouble, and make it double!" -m design \
  -i "Male villain voice, 30s, dramatic and theatrical with a hint of menace" \
  -l English -o villain.mp3

6. Multiple Outputs

Generate multiple files with different voices:

qtts "Hello world" -s Vivian -o output1.mp3
qtts "Hello world" -s Ryan -o output2.mp3
qtts "Hello world" -s Aiden -o output3.mp3

Tips & Best Practices

Voice Cloning

  • Use high-quality reference audio (clear, minimal background noise)
  • Reference audio should be 3-10 seconds long
  • Provide accurate transcription for best results
  • Reference audio quality directly affects output quality

Custom Voices

  • Use each speaker's native language for best quality
  • Keep instructions clear and concise
  • Examples: "Very happy", "Angry and frustrated", "Calm and soothing"

Voice Design

  • Be specific: age, gender, tone, emotional state
  • Good: "25-year-old female, energetic teacher voice, warm and encouraging"
  • Avoid: "Nice voice"

Performance

  • First run downloads models (~1-3GB), be patient
  • GPU (CUDA) recommended for faster generation
  • Models are cached after first use
  • Use shorter texts for faster processing

Language Selection

  • Set specific language when known (faster than Auto)
  • Auto mode works well for mixed-language text
  • Each speaker performs best in their native language

Troubleshooting

Python 3.14/3.13 Installation Issues (macOS)

Problem: If you're using Python 3.14 or 3.13, you may encounter build errors with llvmlite (required by librosa which is needed by qwen-tts).

Solution: Use Python 3.12 instead:

# If you have Python 3.12 installed via Homebrew
rm -rf venv
python3.12 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Why this happens: The llvmlite package (required by numba, which is required by librosa) doesn't have prebuilt wheels for Python 3.13/3.14 yet, and building from source requires LLVM 20 specifically (macOS typically has LLVM 15 or 21).

Alternative: Use conda environment which provides prebuilt numba/llvmlite:

conda create -n qwen3-tts python=3.12 -y
conda activate qwen3-tts
conda install -c conda-forge numba librosa -y
pip install -r requirements.txt

"CUDA out of memory"

  • Use CPU mode: --device cpu
  • Close other applications using GPU
  • Use smaller model (0.6B instead of 1.7B)

"ffmpeg not found"

  • Install ffmpeg (see Prerequisites)
  • Or save as WAV: -o output.wav

"Model download fails"

  • Check internet connection
  • Models download from HuggingFace on first use
  • Use --model with local path if you pre-downloaded

Poor audio quality

  • Check reference audio quality (for clone mode)
  • Try different speaker/language combinations
  • Ensure text is clean (no special formatting)

Slow generation

  • Use GPU if available
  • Install flash-attention: pip install flash-attn --no-build-isolation
  • Use 0.6B model instead of 1.7B

Model Information

Default Models

  • Custom Mode: Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice (~600MB)
  • Clone Mode: Qwen/Qwen3-TTS-12Hz-0.6B-Base (~600MB)
  • Design Mode: Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign (~1.7GB)

Models are automatically downloaded from HuggingFace on first use and cached locally.

Hardware Requirements

Minimum:

  • CPU: Any modern processor
  • RAM: 8GB
  • Storage: 5GB free

Recommended:

  • GPU: NVIDIA GPU with 4GB+ VRAM (CUDA support)
  • RAM: 16GB
  • Storage: 10GB free

Technical Details

  • Tokenizer: Qwen3-TTS-Tokenizer-12Hz
  • Architecture: Discrete multi-codebook LM
  • Precision: bfloat16 (GPU) / float32 (CPU)
  • Sample Rate: 24kHz output
  • Latency: ~2-5 seconds per sentence (GPU)

License

This project uses Qwen3-TTS models which are licensed under Apache 2.0.

Credits

Built on top of Qwen3-TTS by Alibaba Cloud.

Model page: https://huggingface.co/Qwen/Qwen3-TTS-12Hz-0.6B-Base

Support

For issues with the Qwen3-TTS models themselves, refer to:

For CLI-specific issues, check the troubleshooting section above.


Happy voice synthesizing with qtts!

About

qwen3-tts CLI

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors