Skip to content

rediceli/video2subtitle

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

video2srt

Extract audio from video files and generate SRT subtitle files using local Whisper speech recognition. Optionally clean up transcriptions or translate them into another language using Claude.


Requirements

Installation

# 1. Create a virtual environment (Python 3.12 recommended)
python3.12 -m venv venv
source venv/bin/activate        # Windows: venv\Scripts\activate

# 2. Install dependencies
pip install -r requirements.txt

# 3. Install ffmpeg (if not already installed)
# macOS
brew install ffmpeg
# Ubuntu / Debian
sudo apt install ffmpeg

Usage

python video2srt.py [OPTIONS] INPUT

Positional argument

Argument Description
INPUT Path to a video file or a directory of video files

Options

Option Default Description
-o, --output PATH same dir as input Output .srt file (single) or directory (batch)
-m, --model MODEL base Whisper model: tiny base small medium large
-l, --language LANG auto-detect Source language hint, e.g. en, zh
--translate LANG Translate subtitles via Claude, e.g. zh-CN, Japanese
--optimize Claude pass to fix punctuation and phrasing
--claude-model MODEL claude-opus-4-5 Anthropic model ID
-e, --extensions EXTS mp4,mkv,mov,avi,webm Comma-separated extensions for batch mode
-v, --verbose Print each segment as it is transcribed
--version Show version and exit

Examples

# Basic — transcribe a single file (output: video.srt next to the video)
python video2srt.py lecture.mp4

# Specify output path
python video2srt.py lecture.mp4 -o /tmp/lecture.srt

# Use a larger model for better accuracy
python video2srt.py lecture.mp4 -m small

# Hint the source language (faster, more accurate)
python video2srt.py lecture.mp4 -l en

# Clean up phrasing with Claude
export ANTHROPIC_API_KEY=sk-ant-...
python video2srt.py lecture.mp4 --optimize

# Translate to Chinese (Simplified)
python video2srt.py lecture.mp4 --translate "Chinese (Simplified)"

# Optimize AND translate in one pass
python video2srt.py lecture.mp4 --optimize --translate "Chinese (Simplified)"

# Batch — process every video in a directory
python video2srt.py /path/to/videos/

# Batch with a custom output directory
python video2srt.py /path/to/videos/ -o /path/to/subtitles/

Data Flow

INPUT
  └─ [batch] find_videos()          # directory mode: discover video files
       └─ [audio] extract_audio()   # ffmpeg → 16kHz mono WAV (temp file)
            └─ [transcribe] transcribe()   # Whisper → segments
                 └─ [llm] optimize_segments()   # optional: Claude cleanup
                      └─ [llm] translate_segments()  # optional: Claude translation
                           └─ [srt] write_srt()  # write .srt file

Temporary WAV files are deleted immediately after transcription.

Whisper Models

Model Speed Accuracy ~VRAM
tiny fastest lowest ~1 GB
base fast good ~1 GB
small moderate better ~2 GB
medium slow great ~5 GB
large slowest best ~10 GB

base is the default and works well for clear speech. Models are downloaded automatically on first use (~100 MB – 3 GB).



video2srt(中文说明)

从视频文件中提取音频,使用本地 Whisper 语音识别生成 SRT 字幕文件。可选通过 Claude 对转录内容进行润色或翻译成其他语言。


环境要求

  • Python 3.10 及以上
  • ffmpeg(需在系统 PATH 中)
  • Anthropic API Key — 仅在使用 --optimize--translate 时需要

安装

# 1. 创建虚拟环境(推荐 Python 3.12)
python3.12 -m venv venv
source venv/bin/activate        # Windows: venv\Scripts\activate

# 2. 安装依赖
pip install -r requirements.txt

# 3. 安装 ffmpeg(如尚未安装)
# macOS
brew install ffmpeg
# Ubuntu / Debian
sudo apt install ffmpeg

用法

python video2srt.py [选项] 输入路径

位置参数

参数 说明
INPUT 视频文件路径,或包含视频的目录路径

可选参数

参数 默认值 说明
-o, --output PATH 与输入文件同目录 输出 .srt 文件路径(单文件)或目录(批量)
-m, --model MODEL base Whisper 模型:tiny base small medium large
-l, --language LANG 自动检测 源语言提示,如 enzh
--translate LANG 通过 Claude 翻译字幕,如 Chinese (Simplified)Japanese
--optimize 通过 Claude 修正标点与表达
--claude-model MODEL claude-opus-4-5 指定 Anthropic 模型 ID
-e, --extensions EXTS mp4,mkv,mov,avi,webm 批量模式的视频扩展名(逗号分隔)
-v, --verbose 逐段打印转录结果
--version 显示版本并退出

使用示例

# 基本用法 — 转录单个视频(输出 .srt 与视频同目录)
python video2srt.py lecture.mp4

# 指定输出路径
python video2srt.py lecture.mp4 -o /tmp/lecture.srt

# 使用更大的模型提升精度
python video2srt.py lecture.mp4 -m small

# 指定源语言(更快、更准确)
python video2srt.py lecture.mp4 -l zh

# 通过 Claude 润色表达
export ANTHROPIC_API_KEY=sk-ant-...
python video2srt.py lecture.mp4 --optimize

# 翻译为简体中文
python video2srt.py lecture.mp4 --translate "Chinese (Simplified)"

# 润色并翻译(一次完成)
python video2srt.py lecture.mp4 --optimize --translate "Chinese (Simplified)"

# 批量模式 — 处理目录下所有视频
python video2srt.py /path/to/videos/

# 批量模式并指定输出目录
python video2srt.py /path/to/videos/ -o /path/to/subtitles/

Whisper 模型对比

模型 速度 精度 约显存
tiny 最快 最低 ~1 GB
base 良好 ~1 GB
small 中等 较好 ~2 GB
medium 很好 ~5 GB
large 最慢 最佳 ~10 GB

默认使用 base 模型,适合清晰语音。模型会在首次使用时自动下载(约 100 MB – 3 GB)。

About

Automatically generate subtitles from video

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages