Mini-DLSS (DLSS-inspired, not NVIDIA DLSS): temporal video super-resolution trained on public VSR datasets, evaluated with standard SR metrics (PSNR/SSIM) plus temporal stability checks, and exported to ONNX for reproducible deployment experiments.
Preview: LR input, bicubic, Temporal SR, and HR target on the local REDS-style validation/demo set.
Final evaluation artifacts from the Drive-trained temporal_vsr_5f_small_2x checkpoint are documented in results/FINAL_EVALUATION_DEMO_2026-06-02.md.
Best checkpoint:
results/runs/temporal_vsr_5f_small_2x/checkpoints/best.pt
| Method | PSNR-Y | SSIM-Y | tPSNR | diff_energy |
|---|---|---|---|---|
| Bicubic | 36.8033 | 0.9601 | 36.3441 | 0.0217 |
| Single-frame SR fast-cycle (300 steps) | 32.3755 | 0.8807 | 33.8817 | 0.0183 |
| Temporal SR 5f small | 38.3265 | 0.9604 | 36.6018 | 0.0201 |
These metrics are from a local Vimeo-derived REDS-style validation/demo set. They are useful for reproducible pipeline validation and demo comparison, but should not be cited as official REDS benchmark results.
The single-frame SR baseline uses results/runs/week3_single_frame_fast_localreds/checkpoints/best.pt, a 300-step fast-cycle checkpoint. It is included as a pipeline sanity baseline, not as a budget-matched ablation against the temporal checkpoint.
Recruiter/demo artifacts:
- Comparison videos:
results/final/videos/final_temporal_vsr_5f_small_2x/ - Labeled 10-second comparison:
results/final/videos/final_temporal_vsr_5f_small_2x/all_val_comparison_10s_labeled.mp4 - Final training logs:
results/runs/temporal_vsr_5f_small_2x/train_log.jsonl,results/runs/temporal_vsr_5f_small_2x/val_metrics.jsonl - Fixed 10-second PyTorch CPU demo:
results/final/demo/temporal_vsr_5f_small_2x_10s.mp4(34.941 ms/framemeasured bydemo_video.py) - ONNX export:
results/onnx/temporal_vsr_5f_small_2x.onnx - ONNX Runtime CPU latency:
21.589 ms/frameon the same 240-frame clip (results/final/tables/onnxruntime_temporal_vsr_5f_small_2x_latency.json) - Target-relative temporal audit:
results/audit_updated_metrics/tables/target_relative_temporal_comparison.md
The final images and videos were generated by loading the trained
temporal_vsr_5f_small_2x checkpoint and running inference on five-frame LR
windows. In the comparison media, the panel order is:
LR input | Bicubic | Temporal SR prediction | HR target
There are two useful website implementations:
- Static project showcase: embed the committed result images and videos. This is the simplest option for a portfolio or GitHub Pages site and does not require a GPU or Python server.
- Live model demo: deploy
best.ptwith a Python API/Gradio app, or host the exported ONNX model for browser inference. The checkpoint and ONNX files are intentionally ignored by Git, so they must be uploaded separately to the deployment platform.
Use these final artifacts for public presentation:
| Asset | Suggested website use |
|---|---|
all_val_comparison_preview.png |
Hero image, video poster, or social preview |
all_val_comparison_10s_labeled.mp4 |
Primary labeled comparison video |
240_comparison.mp4 |
Short scene comparison |
245_comparison.mp4 |
Short scene comparison |
252_comparison.mp4 |
Short scene comparison |
sample_0000.png |
Full four-panel still |
sample_0001.png |
Full four-panel still |
sample_0002.png |
Full four-panel still |
sample_0003.png |
Full four-panel still |
sample_0004.png |
Full four-panel still or detail crop |
sample_0005.png |
Full four-panel still |
sample_0006.png |
Full four-panel still |
sample_0007.png |
Full four-panel still or detail crop |
temporal_vsr_5f_small_2x_10s.mp4 |
Output-only 2x inference demo |
The labeled 10-second comparison and preview image are the strongest primary evidence. The scene clips are very short, so use them as supporting media rather than the main demonstration. The output-only demo should be accompanied by the labeled comparison because it does not show the input or baselines.
Copy the selected files into the website's public asset directory, preserving simple names such as:
public/projects/mini-dlss/
comparison-preview.png
comparison-10s.mp4
sample-0004.png
sample-0007.png
The generated MP4 uses the mp4v codec. Convert it to H.264 for more reliable
browser playback:
ffmpeg \
-i results/final/videos/final_temporal_vsr_5f_small_2x/all_val_comparison_10s_labeled.mp4 \
-c:v libx264 -crf 20 -pix_fmt yuv420p -movflags +faststart -an \
comparison-10s.mp4Embed the resulting video in HTML or a React component:
<figure>
<video
controls
autoplay
muted
loop
playsinline
preload="metadata"
poster="/projects/mini-dlss/comparison-preview.png"
style="display: block; width: 100%; height: auto;"
>
<source
src="/projects/mini-dlss/comparison-10s.mp4"
type="video/mp4"
/>
</video>
<figcaption>
LR input, bicubic, trained temporal SR prediction, and HR target.
</figcaption>
</figure>Place the final measurements beside the media:
38.3265 dBPSNR-Y+1.5231 dBPSNR-Y over bicubic36.6018 dBtPSNR21.589 ms/frameONNX Runtime CPU latency- Five LR input frames and
2xoutput scale
Also state that these results use the local Vimeo-derived REDS-style validation/demo set and are not official REDS benchmark results.
- Create a
docs/directory containing anindex.htmlpage and a smalldocs/assets/mini-dlss/media directory. - Copy only the preview image, H.264 comparison video, and selected stills.
- Commit and push
docs/. - In the GitHub repository, open Settings > Pages.
- Select Deploy from a branch, then choose the default branch and
/docs.
This publishes the evidence already generated by the trained model. It does not run inference for website visitors.
For a live demonstration, the most direct deployment is a Hugging Face Space or another Python host:
- Upload
results/runs/temporal_vsr_5f_small_2x/checkpoints/best.ptto private model storage or the deployment platform. - Include
models/,configs/temporal_small.toml,utils/, and the preprocessing logic used bydemo_video.py. - Build a Gradio interface that accepts a short LR video, constructs five-frame windows, runs the checkpoint, and returns the 2x MP4.
- Embed the deployed Space in the portfolio with an iframe or link to it from the static showcase.
For client-side inference, upload
results/onnx/temporal_vsr_5f_small_2x.onnx and its external .onnx.data
weights, load them with ONNX Runtime Web, and construct inputs with shape
[batch, 5, 3, height, width]. Browser inference avoids a Python server but
requires additional JavaScript video decoding, frame-window construction, and
output encoding.
- Task: temporal video super-resolution.
- Upscale factor:
2x(fixed for all experiments). - Focus: ML pipeline, model training, benchmarking, and deployment.
- Out of scope (for now): game-engine integration and custom rendering pipelines.
- Train: Vimeo-90K Septuplets, using the local union of Kaggle subsets
vimeo-90k-3+vimeo-90k-4. - Eval/demo: REDS-style validation set generated from Vimeo for local pipeline validation.
- Eval/demo paths:
data/raw/reds/val_sharp,data/raw/reds/val_sharp_bicubic/X2, anddata/splits/reds_val.txt. - Important: the local eval/demo files use REDS-like IDs
240to259, but these are not official REDS validation results unless the paths are replaced with official REDS data and evaluation is rerun. - Split rule: no overlapping source Vimeo sequences between train/val/test subsets.
- Local default setup in this repo uses:
- train sequence root:
data/raw/vimeo90k_union/sequence - train manifests:
data/splits/vimeo90k_union_{train,val,test}.txt
- train sequence root:
- Baseline A: bicubic upscaling.
- Baseline B: single-image SR applied frame-by-frame. The current reported checkpoint is a fast-cycle pipeline baseline, not a budget-matched final ablation.
- Ours: BasicVSR-style temporal model.
- Training path:
- Stage 1:
BasicVSR-Smallfor pipeline validation and quick cycles. - Stage 2: Drive-trained
temporal_vsr_5f_small_2xfinal run for the reported artifacts.
- Stage 1:
- Optional extension: temporal consistency loss and/or flow-assisted alignment.
- Primary quality metrics:
PSNRandSSIM. - Primary reporting domain:
Ychannel in YCbCr. - Secondary sanity report: RGB PSNR/SSIM (optional).
- Border crop before PSNR/SSIM: crop
scalepixels on each side (2pixels for 2x). - Temporal stability metric:
- Preferred: flow-warped temporal PSNR (
tPSNR) between consecutive outputs. - Fallback: frame-to-frame difference energy in low-texture regions.
- Additional target-relative diagnostics after rerunning
eval.py:target_diff_energy,diff_energy_delta,temporal_error_energy,target_tpsnr, andtpsnr_delta.
- Preferred: flow-warped temporal PSNR (
- Report set:
- Local REDS-style validation aggregate metrics.
- Curated qualitative clips (side-by-side LR/Bicubic/Single-frame/Temporal/GT).
- Runtime target: Google Colab GPU for training.
- All runs resumable with periodic checkpoints.
- Validation runs at fixed interval during training.
- Two experiment tiers:
- Fast cycle: short training budget for iteration/debugging.
- Full cycle: long budget for final reporting.
- Seeds/configs/checkpoints tracked per run for reproducibility.
The final checkpoint metadata records step = 80000, best_metric = 38.3265, and a Drive training override with train.max_steps = 150000. The checked-in configs/temporal_small.toml defines the architecture and local defaults; reproduce the exact final training budget by applying the recorded override.
- Temporal model improves local REDS-style PSNR over bicubic and establishes a reproducible evaluation/demo package.
- A budget-matched single-frame baseline should be trained before making a final single-frame-vs-temporal ablation claim.
- ONNX export succeeds; PyTorch CPU demo inference runs on a fixed 10-second clip with reported latency (
ms/frame) on the local machine. - Repo includes reproducible train/eval/export/demo commands and final results artifacts.
- Week 1: lock protocol, metrics, splits, and run tracker format.
- Week 2: data ingestion, alignment checks, window sampling checks.
- Week 3: baseline training/eval and auto-generated markdown results table.
- Week 4: BasicVSR-Small training and validation loop stabilization.
- Week 5: quality-focused temporal training and reproducible best-checkpoint eval.
- Week 6: planned ablations (3/5/7 frames, loss variants, tiny speed model) and latency chart.
- Week 7: ONNX export + mp4 demo CLI.
- Week 8: final report page with metrics, videos, failure cases, and tradeoff analysis.
Completed final artifacts:
bicubicsingle_frame_srfast-cycle pipeline baselinetemporal_vsr_5f_small_2x
Planned ablations / future work:
- Budget-matched
single_frame_sr temporal_vsr_3ftemporal_vsr_7ftemporal_vsr_5f_l1_perceptualtemporal_vsr_5f_tinyspeed-oriented model
mini-dlss/
configs/
data/
scripts/
splits/
models/
metrics/
notebooks/
results/
tables/
videos/
train.py
eval.py
export_onnx.py
demo_video.py
README.md
Week 2 dataset check (clip counts + LR/HR alignment + temporal windows):
python data/scripts/freeze_vimeo_splits.py --vimeo-root /path/to/vimeo90k
python data/scripts/freeze_vimeo_union_splits.py \
--vimeo-a data/raw/kagglehub/datasets/wangsally/vimeo-90k-3/versions/1 \
--vimeo-b data/raw/kagglehub/datasets/wangsally/vimeo-90k-4/versions/1
python data/scripts/build_vimeo_union_root.py \
--seq-a data/raw/kagglehub/datasets/wangsally/vimeo-90k-3/versions/1/sequence \
--seq-b data/raw/kagglehub/datasets/wangsally/vimeo-90k-4/versions/1/sequence
python data/scripts/check_dataset.py \
--hr-root data/raw/reds/val_sharp \
--lr-root data/raw/reds/val_sharp_bicubic/X2 \
--manifest data/splits/reds_val.txt \
--num-frames 5 \
--scale 2If official REDS is not available, build a local REDS-style val set from Vimeo for pipeline validation:
python data/scripts/create_reds_val_from_vimeo.pyWeek 3 baseline eval (writes markdown table + example media):
python eval.py \
--config configs/temporal_small.toml \
--mode bicubic \
--save-videos 2 \
--save-images 8Week 4/5 temporal training for the final reported model:
python train.py --config configs/temporal_small.toml \
--override '{"train":{"max_steps":150000,"val_interval":5000,"checkpoint_interval":5000},"eval":{"max_batches":50}}'Week 5 reproducible best-checkpoint evaluation:
python eval.py \
--config configs/temporal_small.toml \
--mode model \
--checkpoint results/runs/temporal_vsr_5f_small_2x/checkpoints/best.pt \
--device cpu \
--output-dir results/finalBudget-matched single-frame baseline:
bash scripts/run_budget_matched_single_frame.sh configs/single_frame_budget_matched.toml cudaThis trains the single-frame model with num_frames=1 and evaluates it with a 5-frame center-window override so the reported sample set matches the temporal model.
Week 6 planned ablation tracking:
bash scripts/run_ablations.sh \
data/raw/vimeo90k_union/sequence \
data/raw/reds/val_sharp \
data/raw/reds/val_sharp_bicubic/X2 \
50000 \
cudaOfficial REDS evaluation, after official HR/LR roots and manifest are available locally:
bash scripts/eval_official_reds.sh \
/path/to/official/REDS/val_sharp \
/path/to/official/REDS/val_sharp_bicubic/X2 \
/path/to/official_reds_val.txt \
results/runs/temporal_vsr_5f_small_2x/checkpoints/best.pt \
results/official_reds \
cpuWeek 7 ONNX export:
python export_onnx.py \
--config configs/temporal_small.toml \
--checkpoint results/runs/temporal_vsr_5f_small_2x/checkpoints/best.pt \
--output results/onnx/temporal_vsr_5f_small_2x.onnxWeek 7 ONNX Runtime latency benchmark:
python bench_onnxruntime.py \
--onnx results/onnx/temporal_vsr_5f_small_2x.onnx \
--input results/week3/latency_input_10s_lr.mp4 \
--config configs/temporal_small.toml \
--output-json results/final/tables/onnxruntime_temporal_vsr_5f_small_2x_latency.jsonWeek 7 PyTorch CPU demo inference on fixed 10-second LR mp4:
python demo_video.py \
--input results/week3/latency_input_10s_lr.mp4 \
--output results/final/demo/temporal_vsr_5f_small_2x_10s.mp4 \
--config configs/temporal_small.toml \
--checkpoint results/runs/temporal_vsr_5f_small_2x/checkpoints/best.pt \
--device cpu \
--fps 24- Keep
scale=2in all configs. - Use only manifests in
data/splits/. - Log every run in
results/tables/experiment_tracker_template.md. - Report Y-channel metrics with crop border = 2.
- Record the same fixed 10-second clip for PyTorch CPU latency (
ms/frame). - Report official REDS metrics only after replacing the local REDS-style generated data with official REDS data and rerunning evaluation.
