Extract audio from video files and generate SRT subtitle files using local Whisper speech recognition. Optionally clean up transcriptions or translate them into another language using Claude.
# 1. Create a virtual environment (Python 3.12 recommended)
python3.12 -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# 2. Install dependencies
pip install -r requirements.txt
# 3. Install ffmpeg (if not already installed)
# macOS
brew install ffmpeg
# Ubuntu / Debian
sudo apt install ffmpeg
python video2srt.py [OPTIONS] INPUT
| Argument |
Description |
INPUT |
Path to a video file or a directory of video files |
| Option |
Default |
Description |
-o, --output PATH |
same dir as input |
Output .srt file (single) or directory (batch) |
-m, --model MODEL |
base |
Whisper model: tiny base small medium large |
-l, --language LANG |
auto-detect |
Source language hint, e.g. en, zh |
--translate LANG |
— |
Translate subtitles via Claude, e.g. zh-CN, Japanese |
--optimize |
— |
Claude pass to fix punctuation and phrasing |
--claude-model MODEL |
claude-opus-4-5 |
Anthropic model ID |
-e, --extensions EXTS |
mp4,mkv,mov,avi,webm |
Comma-separated extensions for batch mode |
-v, --verbose |
— |
Print each segment as it is transcribed |
--version |
— |
Show version and exit |
# Basic — transcribe a single file (output: video.srt next to the video)
python video2srt.py lecture.mp4
# Specify output path
python video2srt.py lecture.mp4 -o /tmp/lecture.srt
# Use a larger model for better accuracy
python video2srt.py lecture.mp4 -m small
# Hint the source language (faster, more accurate)
python video2srt.py lecture.mp4 -l en
# Clean up phrasing with Claude
export ANTHROPIC_API_KEY=sk-ant-...
python video2srt.py lecture.mp4 --optimize
# Translate to Chinese (Simplified)
python video2srt.py lecture.mp4 --translate "Chinese (Simplified)"
# Optimize AND translate in one pass
python video2srt.py lecture.mp4 --optimize --translate "Chinese (Simplified)"
# Batch — process every video in a directory
python video2srt.py /path/to/videos/
# Batch with a custom output directory
python video2srt.py /path/to/videos/ -o /path/to/subtitles/
INPUT
└─ [batch] find_videos() # directory mode: discover video files
└─ [audio] extract_audio() # ffmpeg → 16kHz mono WAV (temp file)
└─ [transcribe] transcribe() # Whisper → segments
└─ [llm] optimize_segments() # optional: Claude cleanup
└─ [llm] translate_segments() # optional: Claude translation
└─ [srt] write_srt() # write .srt file
Temporary WAV files are deleted immediately after transcription.
| Model |
Speed |
Accuracy |
~VRAM |
| tiny |
fastest |
lowest |
~1 GB |
| base |
fast |
good |
~1 GB |
| small |
moderate |
better |
~2 GB |
| medium |
slow |
great |
~5 GB |
| large |
slowest |
best |
~10 GB |
base is the default and works well for clear speech. Models are downloaded automatically on first use (~100 MB – 3 GB).
从视频文件中提取音频,使用本地 Whisper 语音识别生成 SRT 字幕文件。可选通过 Claude 对转录内容进行润色或翻译成其他语言。
# 1. 创建虚拟环境(推荐 Python 3.12)
python3.12 -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# 2. 安装依赖
pip install -r requirements.txt
# 3. 安装 ffmpeg(如尚未安装)
# macOS
brew install ffmpeg
# Ubuntu / Debian
sudo apt install ffmpeg
python video2srt.py [选项] 输入路径
| 参数 |
说明 |
INPUT |
视频文件路径,或包含视频的目录路径 |
| 参数 |
默认值 |
说明 |
-o, --output PATH |
与输入文件同目录 |
输出 .srt 文件路径(单文件)或目录(批量) |
-m, --model MODEL |
base |
Whisper 模型:tiny base small medium large |
-l, --language LANG |
自动检测 |
源语言提示,如 en、zh |
--translate LANG |
— |
通过 Claude 翻译字幕,如 Chinese (Simplified)、Japanese |
--optimize |
— |
通过 Claude 修正标点与表达 |
--claude-model MODEL |
claude-opus-4-5 |
指定 Anthropic 模型 ID |
-e, --extensions EXTS |
mp4,mkv,mov,avi,webm |
批量模式的视频扩展名(逗号分隔) |
-v, --verbose |
— |
逐段打印转录结果 |
--version |
— |
显示版本并退出 |
# 基本用法 — 转录单个视频(输出 .srt 与视频同目录)
python video2srt.py lecture.mp4
# 指定输出路径
python video2srt.py lecture.mp4 -o /tmp/lecture.srt
# 使用更大的模型提升精度
python video2srt.py lecture.mp4 -m small
# 指定源语言(更快、更准确)
python video2srt.py lecture.mp4 -l zh
# 通过 Claude 润色表达
export ANTHROPIC_API_KEY=sk-ant-...
python video2srt.py lecture.mp4 --optimize
# 翻译为简体中文
python video2srt.py lecture.mp4 --translate "Chinese (Simplified)"
# 润色并翻译(一次完成)
python video2srt.py lecture.mp4 --optimize --translate "Chinese (Simplified)"
# 批量模式 — 处理目录下所有视频
python video2srt.py /path/to/videos/
# 批量模式并指定输出目录
python video2srt.py /path/to/videos/ -o /path/to/subtitles/
| 模型 |
速度 |
精度 |
约显存 |
| tiny |
最快 |
最低 |
~1 GB |
| base |
快 |
良好 |
~1 GB |
| small |
中等 |
较好 |
~2 GB |
| medium |
慢 |
很好 |
~5 GB |
| large |
最慢 |
最佳 |
~10 GB |
默认使用 base 模型,适合清晰语音。模型会在首次使用时自动下载(约 100 MB – 3 GB)。