Deploy Z-Image-Turbo to Replicate.
🚀 Replicate Model: r8.im/leizeng/z-image-turbo
- Model: Z-Image-Turbo (Tongyi-MAI)
- Parameters: 6B
- Inference Steps: 8 NFEs
- VRAM: 16GB
- Features:
- Sub-second inference on H800 GPUs
- Bilingual text rendering (English & Chinese)
- Photorealistic image generation
- Strong instruction adherence
zimage-replicate-model/
├── cog.yaml # Cog configuration
├── predict.py # Prediction interface (with detailed logging)
├── requirements.txt # Python dependencies
├── download_weights.py # Model weight downloader
├── TROUBLESHOOTING.md # Debugging guide
└── README.md # This file
- ✅ Detailed Logging: Timestamped logs for every operation
- ✅ Error Handling: Full tracebacks on failures
- ✅ Auto-download Model: Model downloaded on first run (~12GB)
- ✅ Optimized Setup: Warmup run for faster predictions
- ✅ Resource Monitoring: GPU memory usage tracking
- ✅ Unbuffered Output: Real-time log visibility
- Cog installed
- Docker installed and running
- NVIDIA GPU with CUDA support (for testing)
cog buildThis will:
- Create a Docker container with CUDA 12.1
- Install Python dependencies
- Download Z-Image-Turbo weights from Hugging Face
cog predict -i prompt="A beautiful Chinese landscape painting"cog predict \
-i prompt="Young woman in traditional Hanfu dress" \
-i width=1024 \
-i height=1024 \
-i num_inference_steps=9 \
-i seed=42cog login
cog push r8.im/leizeng/z-image-turboimport replicate
output = replicate.run(
"leizeng/z-image-turbo:latest",
input={
"prompt": "A serene mountain landscape at sunset",
"width": 1024,
"height": 1024,
"seed": 42
}
)
print(output)- prompt (string): Text description for image generation. Supports English and Chinese.
- width (integer, 512-2048): Output image width. Default: 1024
- height (integer, 512-2048): Output image height. Default: 1024
- num_inference_steps (integer, 1-50): Number of denoising steps. Default: 9 (8 actual steps)
- seed (integer, optional): Random seed for reproducibility
- Build time: 10-15 minutes (downloading 12GB model)
- Setup time: 30-60 seconds (loading model to GPU)
- Inference time: 2-5 seconds (1024x1024, 8 steps)
- VRAM usage: ~16GB
- Recommended resolution: 1024x1024
If you encounter issues (model stuck, no logs, etc.), see TROUBLESHOOTING.md for detailed debugging guide.
Z-Image is released under Apache 2.0 license. See LICENSE.