A fast video transcription CLI tool powered by whisper.cpp. Extracts audio from video files and transcribes it to text with GPU acceleration.
- Video → Audio → Text: Automatic audio extraction via FFmpeg
- GPU Accelerated: Metal (macOS M*), Vulkan (NVIDIA), CUDA, or CPU fallback
- Multiple Output Formats: TXT, SRT (subtitle), JSON (with word-level timestamps), MD (Markdown with YAML front matter)
- Batch Processing: Transcribe entire directories recursively
- Model Quantization: Q4_K/Q5_K/Q6_K/Q8_0 for speed/size tradeoffs
- Configurable: YAML configuration file with CLI argument overrides
- Progress Reporting: Real-time progress bars for downloads and transcription
- FFmpeg: Required for audio extraction
- macOS:
brew install ffmpeg - Ubuntu:
sudo apt install ffmpeg - Windows: Download from https://ffmpeg.org/
- macOS:
brew tap RedAtman/tap
brew install transcriberTaps into RedAtman/homebrew-tap. Formula is automatically updated on each release.
git clone <repo-url>
cd transcriber
cargo build --releaseThe binary is at ./target/release/transcriber.
# Transcribe with default settings (base model, txt output)
transcriber -i video.mp4
# Specify model and language
transcriber -i video.mp4 -m medium -l zh
# Custom output directory and SRT format
transcriber -i video.mp4 -o ./subtitles --format srt# Transcribe all videos in a directory
transcriber -d ./videos
# Skip already-transcribed files
transcriber -d ./videos --skip-existing
# Multiple output formats
transcriber -d ./videos --format "txt,srt,json"# Initial prompt for decoder context
transcriber -i video.mp4 --initial-prompt "technical terms"
# Sampling temperature (0.0 = deterministic, 1.0 = more random)
transcriber -i video.mp4 --temperature 0.2
# Suppress non-speech tokens
transcriber -i video.mp4 --suppress-non-speech
# No-speech detection threshold
transcriber -i video.mp4 --no-speech-threshold 0.5
# Split on word boundaries
transcriber -i video.mp4 --split-on-word
# Combined example
transcriber -i video.mp4 -m medium -l en --initial-prompt "technology" --temperature 0.3
transcriber -i video.mp4 -m medium -l zh --initial-prompt "科技" --temperature 0.3# Add custom metadata to the output (stored in Markdown YAML front matter)
transcriber -i video.mp4 --format md --meta title="My Talk" --meta location=Beijing
# Media metadata is auto-detected via ffprobe when using md format:
# - source: filename, size, format, bitrate, duration
# - video: codec, resolution, FPS
# - audio: codec, sample rate, channels# Generate default config file
transcriber init
# Use custom config
transcriber -i video.mp4 --config ./my-config.yamlDefault config location: ~/.config/transcriber/config.yaml
| Model | Size | Description |
|---|---|---|
| tiny | 75 MB | Fastest, lowest quality |
| base | 148 MB | Default, balanced |
| small | 488 MB | Better quality |
| medium | 1.5 GB | High quality |
| large-v3-turbo | 800 MB | Best quality/speed ratio |
- TXT: Plain text, one segment per line
- SRT: SubRip subtitle format with timestamps
- JSON: Structured data with word-level timing and metadata
- MD: Markdown with YAML front matter — includes auto-detected media metadata (source, video, audio info), custom metadata, and plain text body (no timestamps)
MIT