本项目是TransNetV2视频镜头检测模型的C++实现版本,支持动态输入形状、多线程推理和视频自动切割。
This is a C++ implementation of the TransNetV2 video shot boundary detection model, featuring dynamic input shapes, multi-threaded inference, and automatic video splitting.
- 多线程并行处理 | Multi-threaded parallel processing
- 4个线程同时进行推理
- 4 threads running inference simultaneously
- 动态形状支持 | Dynamic shape support
- 支持任意长度的视频输入
- Supports variable-length video input
- 模型嵌入 | Embedded model
- 模型数据直接嵌入可执行文件
- Model data embedded in executable
- 无需外部模型文件
- No external model file required
- transnet.exe - 视频镜头检测推理
- video_splitter.exe - 视频自动切割
- transnet_dll.dll - C ABI接口库
- PyTorch (LibTorch) - 模型推理
- OpenCV 4.12.0 - 视频处理
- CMake - 项目构建
- C++17 - 编程语言
go-TransNet/
├── archive/
│ ├── src/
│ │ ├── transnet.cpp # 推理主程序 | Main inference program
│ │ ├── transnet_dll.cpp # DLL实现 | DLL implementation
│ │ ├── transnet_dll.h # DLL头文件 | DLL header
│ │ ├── video_splitter.cpp # 视频切割工具 | Video splitter
│ │ ├── model_data.cpp # 嵌入模型数据 | Embedded model data
│ │ ├── model_data.h # 模型数据头 | Model data header
│ │ ├── model_to_header.py # 模型转换脚本 | Model conversion script
│ │ └── test_model.py # 测试脚本 | Test script
│ ├── libtorch/ # PyTorch库 | PyTorch libraries
│ ├── opencv4120/ # OpenCV库 | OpenCV libraries
│ ├── install/ # 编译输出 | Build output
│ └── CMakeLists.txt # CMake配置 | CMake configuration
├── example/ # 示例视频 | Example videos
└── build/ # 构建目录 | Build directory
cd archive
mkdir build
cd build
cmake .. -G "Visual Studio 17 2022" -A x64
cmake --build . --config Release# 使用默认输出目录 ./output
# Using default output directory ./output
transnet.exe video.mp4
# 指定输出目录
# Specify output directory
transnet.exe video.mp4 my_output# 使用默认输出目录 ./segments
# Using default output directory ./segments
video_splitter.exe video.mp4 output/scenes.txt
# 指定输出目录
# Specify output directory
video_splitter.exe video.mp4 output/scenes.txt my_segments// transnet.exe video.mp4
//
// 输出文件 | Output files:
// - output/predictions.txt # 每帧的预测概率 | Per-frame prediction probabilities
// - output/scenes.txt # 场景边界 | Scene boundariespredictions.txt 格式 | Format:
0.123456 0.234567 # single_pred many_pred
0.234567 0.345678
...
scenes.txt 格式 | Format:
0 64 # start_frame end_frame
65 154
155 244
...
# 切割视频为多个片段
# Split video into multiple segments
video_splitter.exe video.mp4 output/scenes.txt segments输出文件命名规则 | Output naming convention:
scene_0000_0_64_video.mp4 # scene_ID_startframe_endframe_originalname.mp4
scene_0001_65_154_video.mp4
...
┌─────────────────────────────────────────────────────────┐
│ Input Video │
└─────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ Frame Splitting │
│ (Divide video into 4 segments) │
└─────────────────────────────────────────────────────────┘
│
┌─────────────┼─────────────┐
▼ ▼ ▼
┌───────────┐ ┌───────────┐ ┌───────────┐
│ Thread 0 │ │ Thread 1 │ │ Thread 2 │ Thread 3
│ Frames │ │ Frames │ │ Frames │ (Frames
│ 0-3129 │ │ 3130-6259 │ │6260-9389 │ 9390-12517)
└─────┬─────┘ └─────┬─────┘ └─────┬─────┘ └────┬─────┘
│ │ │ │
▼ ▼ ▼ ▼
┌─────────────────────────────────────────────────┐
│ TransNetV2 Inference │
│ (Parallel Model Forward) │
└─────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────┐
│ Result Aggregation │
└─────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────┐
│ predictions.txt + scenes.txt │
└─────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ Input Video │
└─────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ scenes.txt │
│ (start_frame end_frame) │
└─────────────────────────────────────────────────────────┘
│
┌─────────────┼─────────────┐
▼ ▼ ▼
┌───────────┐ ┌───────────┐ ┌───────────┐
│ Segment 0 │ │ Segment 1 │ │ Segment 2 │ ...
│ Frame │ │ Frame │ │ Frame │
│ 0-64 │ │ 65-154 │ │ 155-244 │
└─────┬─────┘ └─────┬─────┘ └─────┬─────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────┐
│ scene_0000_0_64.mp4 │
│ scene_0001_65_154.mp4 │
│ scene_0002_155_244.mp4 │
└─────────────────────────────────────────────────┘
// 创建/销毁句柄 | Create/Destroy handle
void* transnet_create();
void transnet_destroy(void* handle);
// 加载视频或文件夹 | Load video or folder
int32_t transnet_load_video(void* handle, const char* video_path);
int32_t transnet_load_folder(void* handle, const char* folder_path);
// 设置线程数 | Set thread count
void transnet_set_num_threads(void* handle, int32_t num_threads);
// 运行推理 | Run inference
int32_t transnet_run_inference(void* handle);
// 获取结果 | Get results
TransNetResults transnet_get_results(void* handle);
TransNetScenes transnet_get_scenes(void* handle, float threshold);
// 保存结果 | Save results
int32_t transnet_save_results(void* handle, const char* output_dir);#include "transnet_dll.h"
void* handle = transnet_create();
transnet_set_num_threads(handle, 4);
transnet_load_video(handle, "video.mp4");
transnet_run_inference(handle);
transnet_save_results(handle, "./output");
transnet_destroy(handle);- 视频帧不再一次性全部加载到内存
- Video frames are not all loaded into memory at once
- 每个线程独立打开视频文件并读取所需帧
- Each thread independently opens the video file and reads required frames
- 4线程同时处理不同视频段
- 4 threads processing different video segments simultaneously
- 使用
std::thread和std::atomic管理并发 - Using
std::threadandstd::atomicfor concurrency management
- 使用H.264/AVC编码器 (
avc1) 确保兼容性 - Using H.264/AVC encoder (
avc1) for compatibility - 保持原始视频的FPS和分辨率
- Maintaining original video's FPS and resolution
- LibTorch 2.0+ - PyTorch C++库 | PyTorch C++ library
- OpenCV 4.12.0 - 计算机视觉库 | Computer vision library
- CMake 3.18+ - 构建工具 | Build tool
- Visual Studio 2022 或等价编译器 | or equivalent compiler
model_to_header.py- 将PyTorch模型转换为C++头文件- Converts PyTorch model to C++ header file
| 文件 | File | 说明 | Description |
|---|---|---|---|
transnet.cpp |
Main inference program | 镜头检测主程序,支持多线程推理 | |
transnet_dll.cpp |
DLL implementation | C ABI兼容的动态链接库实现 | |
video_splitter.cpp |
Video splitter | 读取scenes.txt切割视频为片段 | |
model_data.cpp |
Embedded model | 嵌入的TransNetV2模型数据 | |
model_to_header.py |
Model converter | PyTorch模型转C++头文件脚本 | |
test_model.py |
Test script | 模型测试和验证脚本 |
本项目基于TransNetV2原项目,遵循相同许可证。
This project is based on the original TransNetV2 and follows the same license.
- ✅ 支持动态输入形状 | Dynamic input shape support
- ✅ 4线程并行推理 | 4-thread parallel inference
- ✅ 模型嵌入可执行文件 | Model embedded in executable
- ✅ 视频自动切割功能 | Automatic video splitting
- ✅ C ABI DLL接口 | C ABI DLL interface
- ✅ 中文英文双语文档 | Bilingual documentation (Chinese/English)
如有问题或建议,请提交Issue。
For questions or suggestions, please submit an Issue.