video2srt

Extract audio from video files and generate SRT subtitle files using local Whisper speech recognition. Optionally clean up transcriptions or translate them into another language using Claude.

Requirements

Python 3.10+
ffmpeg (must be on your PATH)
An Anthropic API key — only needed for --optimize / --translate

Installation

# 1. Create a virtual environment (Python 3.12 recommended)
python3.12 -m venv venv
source venv/bin/activate        # Windows: venv\Scripts\activate

# 2. Install dependencies
pip install -r requirements.txt

# 3. Install ffmpeg (if not already installed)
# macOS
brew install ffmpeg
# Ubuntu / Debian
sudo apt install ffmpeg

Usage

python video2srt.py [OPTIONS] INPUT

Positional argument

Argument	Description
`INPUT`	Path to a video file or a directory of video files

Options

Option	Default	Description
`-o, --output PATH`	same dir as input	Output `.srt` file (single) or directory (batch)
`-m, --model MODEL`	`base`	Whisper model: `tiny` `base` `small` `medium` `large`
`-l, --language LANG`	auto-detect	Source language hint, e.g. `en`, `zh`
`--translate LANG`	—	Translate subtitles via Claude, e.g. `zh-CN`, `Japanese`
`--optimize`	—	Claude pass to fix punctuation and phrasing
`--claude-model MODEL`	`claude-opus-4-5`	Anthropic model ID
`-e, --extensions EXTS`	`mp4,mkv,mov,avi,webm`	Comma-separated extensions for batch mode
`-v, --verbose`	—	Print each segment as it is transcribed
`--version`	—	Show version and exit

Examples

# Basic — transcribe a single file (output: video.srt next to the video)
python video2srt.py lecture.mp4

# Specify output path
python video2srt.py lecture.mp4 -o /tmp/lecture.srt

# Use a larger model for better accuracy
python video2srt.py lecture.mp4 -m small

# Hint the source language (faster, more accurate)
python video2srt.py lecture.mp4 -l en

# Clean up phrasing with Claude
export ANTHROPIC_API_KEY=sk-ant-...
python video2srt.py lecture.mp4 --optimize

# Translate to Chinese (Simplified)
python video2srt.py lecture.mp4 --translate "Chinese (Simplified)"

# Optimize AND translate in one pass
python video2srt.py lecture.mp4 --optimize --translate "Chinese (Simplified)"

# Batch — process every video in a directory
python video2srt.py /path/to/videos/

# Batch with a custom output directory
python video2srt.py /path/to/videos/ -o /path/to/subtitles/

Data Flow

INPUT
  └─ [batch] find_videos()          # directory mode: discover video files
       └─ [audio] extract_audio()   # ffmpeg → 16kHz mono WAV (temp file)
            └─ [transcribe] transcribe()   # Whisper → segments
                 └─ [llm] optimize_segments()   # optional: Claude cleanup
                      └─ [llm] translate_segments()  # optional: Claude translation
                           └─ [srt] write_srt()  # write .srt file

Temporary WAV files are deleted immediately after transcription.

Whisper Models

Model	Speed	Accuracy	~VRAM
tiny	fastest	lowest	~1 GB
base	fast	good	~1 GB
small	moderate	better	~2 GB
medium	slow	great	~5 GB
large	slowest	best	~10 GB

base is the default and works well for clear speech. Models are downloaded automatically on first use (~100 MB – 3 GB).

video2srt（中文说明）

从视频文件中提取音频，使用本地 Whisper 语音识别生成 SRT 字幕文件。可选通过 Claude 对转录内容进行润色或翻译成其他语言。

环境要求

Python 3.10 及以上
ffmpeg（需在系统 PATH 中）
Anthropic API Key — 仅在使用 --optimize 或 --translate 时需要

安装

# 1. 创建虚拟环境（推荐 Python 3.12）
python3.12 -m venv venv
source venv/bin/activate        # Windows: venv\Scripts\activate

# 2. 安装依赖
pip install -r requirements.txt

# 3. 安装 ffmpeg（如尚未安装）
# macOS
brew install ffmpeg
# Ubuntu / Debian
sudo apt install ffmpeg

用法

python video2srt.py [选项] 输入路径

位置参数

参数	说明
`INPUT`	视频文件路径，或包含视频的目录路径

可选参数

参数	默认值	说明
`-o, --output PATH`	与输入文件同目录	输出 `.srt` 文件路径（单文件）或目录（批量）
`-m, --model MODEL`	`base`	Whisper 模型：`tiny` `base` `small` `medium` `large`
`-l, --language LANG`	自动检测	源语言提示，如 `en`、`zh`
`--translate LANG`	—	通过 Claude 翻译字幕，如 `Chinese (Simplified)`、`Japanese`
`--optimize`	—	通过 Claude 修正标点与表达
`--claude-model MODEL`	`claude-opus-4-5`	指定 Anthropic 模型 ID
`-e, --extensions EXTS`	`mp4,mkv,mov,avi,webm`	批量模式的视频扩展名（逗号分隔）
`-v, --verbose`	—	逐段打印转录结果
`--version`	—	显示版本并退出

使用示例

# 基本用法 — 转录单个视频（输出 .srt 与视频同目录）
python video2srt.py lecture.mp4

# 指定输出路径
python video2srt.py lecture.mp4 -o /tmp/lecture.srt

# 使用更大的模型提升精度
python video2srt.py lecture.mp4 -m small

# 指定源语言（更快、更准确）
python video2srt.py lecture.mp4 -l zh

# 通过 Claude 润色表达
export ANTHROPIC_API_KEY=sk-ant-...
python video2srt.py lecture.mp4 --optimize

# 翻译为简体中文
python video2srt.py lecture.mp4 --translate "Chinese (Simplified)"

# 润色并翻译（一次完成）
python video2srt.py lecture.mp4 --optimize --translate "Chinese (Simplified)"

# 批量模式 — 处理目录下所有视频
python video2srt.py /path/to/videos/

# 批量模式并指定输出目录
python video2srt.py /path/to/videos/ -o /path/to/subtitles/

Whisper 模型对比

模型	速度	精度	约显存
tiny	最快	最低	~1 GB
base	快	良好	~1 GB
small	中等	较好	~2 GB
medium	慢	很好	~5 GB
large	最慢	最佳	~10 GB

默认使用 base 模型，适合清晰语音。模型会在首次使用时自动下载（约 100 MB – 3 GB）。

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
audio.py		audio.py
batch.py		batch.py
llm.py		llm.py
requirements.txt		requirements.txt
srt.py		srt.py
transcribe.py		transcribe.py
video2srt.py		video2srt.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

video2srt

Requirements

Installation

Usage

Positional argument

Options

Examples

Data Flow

Whisper Models

video2srt（中文说明）

环境要求

安装

用法

位置参数

可选参数

使用示例

Whisper 模型对比

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

video2srt

Requirements

Installation

Usage

Positional argument

Options

Examples

Data Flow

Whisper Models

video2srt（中文说明）

环境要求

安装

用法

位置参数

可选参数

使用示例

Whisper 模型对比

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages