🎤 SoulX-Singer

Official inference code for
SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis

🎵 Overview

SoulX-Singer is a high-fidelity, zero-shot singing voice synthesis model that enables users to generate realistic singing voices for unseen singers. It supports melody-conditioned (F0 contour) and score-conditioned (MIDI notes) control for precise pitch, rhythm, and expression.

SoulX-Singer-SVC is a singing voice conversion (SVC) model finetuned from SoulX-Singer. Singing Voice Conversion aims to transform a source singing recording into the target singer’s voice while preserving the original melody, rhythm, and lyrical content. Based on the strong generative capability of SoulX-Singer, SoulX-Singer-SVC enables high-quality singing voice conversion directly from raw singing audio, without requiring lyric or MIDI transcriptions.

✨ Key Features

SoulX-Singer

🎤 Zero-Shot Singing – Generate high-fidelity voices for unseen singers, no fine-tuning needed.
🎵 Flexible Control Modes – Melody (F0) and Score (MIDI) conditioning.
📚 Large-Scale Dataset – 42,000+ hours of aligned vocals, lyrics, notes across Mandarin, English, Cantonese.
🧑‍🎤 Timbre Cloning – Preserve singer identity across languages, styles, and edited lyrics.
✏️ Singing Voice Editing – Modify lyrics while keeping natural prosody.
🌐 Cross-Lingual Synthesis – High-fidelity synthesis by disentangling timbre from content.

SoulX-Singer-SVC

🎙️ Zero-Shot Timbre and Style Transfer – Transfer singer identity and style to unseen voices without per-speaker fine-tuning.
🌍 Language-Agnostic Conversion – Works across multilingual singing content.
🔄 Transcription-Free Audio-to-Audio Conversion – Convert target singing directly without lyrics transcription or MIDI inputs.

🎬 Demo Examples

Singing Voice Synthesis (SVS)

-Soul-Singer.mp4

-Soux-Singer.mp4

Singing Voice Conversion (SVC)

soulx-singer-svc.mp4

📰 News

[2026-03-16] SoulX-Singer-SVC is released, and SoulX-Singer Online Demo has been updated to support singing voice conversion (SVC).
[2026-02-12] SoulX-Singer Eval Dataset is now available on Hugging Face Datasets.
[2026-02-09] SoulX-Singer Online Demo is live on Hugging Face Spaces — try singing voice synthesis in your browser.
[2026-02-08] MIDI Editor is available on Hugging Face Spaces.
[2026-02-06] SoulX-Singer inference code and models released.

🚀 Quick Start

1. Clone Repository

git clone https://github.com/Soul-AILab/SoulX-Singer.git
cd SoulX-Singer

2. Set Up Environment

1. Install Conda (if not already installed): https://docs.conda.io/en/latest/miniconda.html

2. Create and activate a Conda environment:

conda create -n soulxsinger -y python=3.10
conda activate soulxsinger

3. Install dependencies:

pip install -r requirements.txt

⚠️ If you are in mainland China, use a PyPI mirror:

pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com

3. Download Pretrained Models

Install Hugging Face Hub if needed:

pip install -U huggingface_hub

Download the SVS, SVC model and preprocessing models:

pip install -U huggingface_hub

# Download the SoulX-Singer SVS and SVC model
hf download Soul-AILab/SoulX-Singer --local-dir pretrained_models/SoulX-Singer

# Download models required for preprocessing
hf download Soul-AILab/SoulX-Singer-Preprocess --local-dir pretrained_models/SoulX-Singer-Preprocess

4. Run the Demo

Run the SVS inference demo

bash example/infer.sh

This script relies on metadata generated from the preprocessing pipeline, including vocal separation and transcription. Users should follow the steps in preprocess to prepare the necessary metadata before running the demo with their own data.

⚠️ Important Note The metadata produced by the automatic preprocessing pipeline may not perfectly align the singing audio with the corresponding lyrics and musical notes. For best synthesis quality, we strongly recommend manually correcting the alignment using the 🎼 Midi-Editor.

How to use the Midi-Editor:

Eiditing Metadata with Midi-Editor

Run the SVC inference demo

bash example/infer_svc.sh

This example performs audio-to-audio SVC, converting the target singing into the prompt timbre using waveform and F0 inputs. To prepare your own SVC data, run example/preprocess.sh with midi_transcribe=False.

🌐 WebUI

You can launch the interactive interface for SVS (Synthesised from lyrics and MIDI transcriptions) with:

python webui.py

For SVC WebUI (audio-to-audio conversion):

python webui_svc.py

Apple MLX Checkpoints

This repository also includes conversion and bridge scripts for the mlx-community SoulX-Singer checkpoint set:

mlx-community/SoulX-Singer-4bit
mlx-community/SoulX-Singer-8bit
mlx-community/SoulX-Singer-bf16
mlx-community/SoulX-Singer-fp32

Current status:

svs/ contains the SoulX-Singer singing voice synthesis checkpoint.
svc/ contains the SoulX-Singer-SVC voice conversion checkpoint.
Weights are stored as MLX-compatible safetensors.
4bit and 8bit use MLX affine quantization and can be dequantized by the bridge loader.
Full audio generation still uses the official PyTorch model structure as a bridge. This is not yet a pure end-to-end MLX runtime.

Install the bridge environment. Use Python 3.10, matching the upstream requirements:

conda create -n soulxsinger-mlx -y python=3.10
conda activate soulxsinger-mlx
python -m pip install -U pip
python -m pip install -r requirements.txt mlx safetensors huggingface_hub hf_transfer

Download a checkpoint and the local Whisper encoder used by the SVC bridge:

HF_HUB_ENABLE_HF_TRANSFER=1 hf download \
  mlx-community/SoulX-Singer-bf16 \
  --local-dir ./models/SoulX-Singer-bf16

hf download openai/whisper-base \
  --local-dir pretrained_models/openai__whisper-base

Run a short SVS bridge test:

PYTORCH_ENABLE_MPS_FALLBACK=1 \
SOULX_WHISPER_MODEL=pretrained_models/openai__whisper-base \
python scripts/inference_mlx_bridge.py \
  --model ./models/SoulX-Singer-bf16 \
  --component svs \
  --device mps \
  --prompt_wav_path example/audio/zh_prompt.mp3 \
  --prompt_metadata_path example/audio/zh_prompt.json \
  --target_metadata_path example/audio/zh_target.json \
  --control melody \
  --n_steps 1 \
  --cfg 1 \
  --save_dir outputs_mlx_bridge/svs

Run an SVC bridge test:

PYTORCH_ENABLE_MPS_FALLBACK=1 \
SOULX_WHISPER_MODEL=pretrained_models/openai__whisper-base \
python scripts/inference_mlx_bridge.py \
  --model ./models/SoulX-Singer-bf16 \
  --component svc \
  --device mps \
  --prompt_wav_path example/audio/zh_prompt.mp3 \
  --target_wav_path example/audio/music.mp3 \
  --prompt_f0_path example/audio/zh_prompt_f0.npy \
  --target_f0_path example/audio/music_f0.npy \
  --n_steps 1 \
  --cfg 1 \
  --save_dir outputs_mlx_bridge/svc

Conversion commands:

python scripts/convert_mlx.py \
  --source pretrained_models/SoulX-Singer \
  --output ./models/SoulX-Singer-bf16 \
  --precision bf16

python scripts/quantize_mlx.py \
  --source ./models/SoulX-Singer-bf16 \
  --output ./models/SoulX-Singer-4bit \
  --bits 4

python scripts/shard_safetensors.py ./models/SoulX-Singer-4bit \
  --max-shard-size 128MiB \
  --remove-source

🚧 Roadmap

🖥️ Web-based UI for easy and interactive inference
🌐 Online MIDI Editor deployment on Hugging Face Spaces
🌐 Online demo deployment on Hugging Face Spaces
📊 Release the SoulX-Singer-Eval benchmark
🎹 Inference support for user-friendly MIDI-based input
📚 Comprehensive tutorials and usage documentation
🎵 Support for wav-to-wav singing voice conversion (without transcription)

🙏 Acknowledgements

Special thanks to the following open-source projects:

📄 License

We use the Apache 2.0 license. Researchers and developers are free to use the codes and model weights of our SoulX-Singer. Check the license at LICENSE for more details.

⚠️ Usage Disclaimer

SoulX-Singer is intended for academic research, educational purposes, and legitimate applications such as personalized singing synthesis and assistive technologies.

Please note:

🎤 Respect intellectual property, privacy, and personal consent when generating singing content.
🚫 Do not use the model to impersonate individuals without authorization or to create deceptive audio.
⚠️ The developers assume no liability for any misuse of this model.

We advocate for the responsible development and use of AI and encourage the community to uphold safety and ethical principles. For ethics or misuse concerns, please contact us.

📄 Citation

If you use SoulX-Singer in your research, please cite:

@misc{soulxsinger,
      title={SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis}, 
      author={Jiale Qian and Hao Meng and Tian Zheng and Pengcheng Zhu and Haopeng Lin and Yuhang Dai and Hanke Xie and Wenxiao Cao and Ruixuan Shang and Jun Wu and Hongmei Liu and Hanlin Wen and Jian Zhao and Zhonglin Jiang and Yong Chen and Shunshun Yin and Ming Tao and Jianguo Wei and Lei Xie and Xinsheng Wang},
      year={2026},
      eprint={2602.07803},
      archivePrefix={arXiv},
      primaryClass={eess.AS},
      url={https://arxiv.org/abs/2602.07803}, 
}

📬 Contact Us

We welcome your feedback, questions, and collaboration:

Email: qianjiale@soulapp.cn | menghao@soulapp.cn | wangxinsheng@soulapp.cn
Join discussions: WeChat or Soul APP groups for technical discussions and updates:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎤 SoulX-Singer

🎵 Overview

✨ Key Features

SoulX-Singer

SoulX-Singer-SVC

🎬 Demo Examples

Singing Voice Synthesis (SVS)

Singing Voice Conversion (SVC)

📰 News

🚀 Quick Start

1. Clone Repository

2. Set Up Environment

3. Download Pretrained Models

4. Run the Demo

Run the SVS inference demo

Run the SVC inference demo

🌐 WebUI

Apple MLX Checkpoints

🚧 Roadmap

🙏 Acknowledgements

📄 License

⚠️ Usage Disclaimer

📄 Citation

📬 Contact Us

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
assets		assets
cli		cli
example		example
preprocess		preprocess
scripts		scripts
soulxsinger		soulxsinger
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
webui.py		webui.py
webui_svc.py		webui_svc.py

Folders and files

Latest commit

History

Repository files navigation

🎤 SoulX-Singer

🎵 Overview

✨ Key Features

SoulX-Singer

SoulX-Singer-SVC

🎬 Demo Examples

Singing Voice Synthesis (SVS)

Singing Voice Conversion (SVC)

📰 News

🚀 Quick Start

1. Clone Repository

2. Set Up Environment

3. Download Pretrained Models

4. Run the Demo

Run the SVS inference demo

Run the SVC inference demo

🌐 WebUI

Apple MLX Checkpoints

🚧 Roadmap

🙏 Acknowledgements

📄 License

⚠️ Usage Disclaimer

📄 Citation

📬 Contact Us

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages