Unofficial PyTorch implementation starter for VideoDirector: Precise Video Editing via Text-to-Video Models (CVPR 2025).
If this repo saves you reading / reproduction time, please star it and follow @StaryMoon. I am building honest open reproduction starters for recent CVPR papers.
This repository is an independent, unofficial, work-in-progress starter.
- Paper: VideoDirector: Precise Video Editing via Text-to-Video Models
- Venue: CVPR 2025
- Reproduction status: benchmarks are not reproduced yet
- Relationship to authors: this repo is not official and is not affiliated with the paper authors.
This v0.1.0 starter implements a compact, readable scaffold inspired by the paper:
- temporal token encoder
- text/control token fusion
- cross-attention video block
- toy denoising objective
- smoke-test script
The goal is to make the high-level idea easy to inspect, fork, and improve.
- full video diffusion model
- VAE or latent video tokenizer
- large-scale training recipe
- generation-quality reproduction
git clone https://github.com/StaryMoon/VideoDirector-Unofficial.git
cd VideoDirector-Unofficial
pip install -r requirements.txt
python scripts/smoke_test.pyExpected output includes:
loss: ...
video: torch.Size([2, 8, 16, 64])
import torch
from videodirector_unofficial import UnofficialStarter
image = torch.rand(2, 3, 64, 64)
model = UnofficialStarter(kind="video")
out = model(image)- Replace toy modules with a closer implementation of the paper.
- Add dataset loader and config files.
- Add metric scripts and visualization.
- Reproduce a small benchmark or ablation table.
- Add pretrained weights once experiments are stable.
cvpr-2025, video-editing, text-to-video, diffusion, pytorch, unofficial-implementation
Please cite the original paper if you use the method. This repo is only an unofficial starter and does not replace the paper.
MIT License. The original paper and official materials remain owned by their respective authors / publishers.