Skip to content

dan64/vs-cmnet2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

vs-cmnet2

VapourSynth filter for exemplar-based video colorization using CMNET2.

Colorizes black-and-white clips by propagating color from reference frames using the CMNET2 deep learning model with a sliding permanent-memory window.


Installation

Download the latest wheel from Releases and install:

pip install vscmnet2-1.0.0-py3-none-any.whl

Plugins setup

Download plugins_win.zip from the Release v1.0.0 and extract it into vscmnet2/plugins/. The resulting tree will be:

vscmnet2/plugins/
├── Support/
│   ├── TCanny.dll          # Edge detection
│   └── akarin.dll          # Expression evaluation
├── MiscFilter/MiscFilters/
│   └── MiscFilters.dll     # Scene-change detection (SCDetect)
└── SourceFilter/LSmashSource/
    ├── LSMASHSource.dll     # Video file reader
    ├── vcruntime140.dll
    └── vcruntime140_1.dll

Model weights

See Model Weights below.


Requirements

  • Python ≥ 3.12
  • VapourSynth ≥ R74
  • CUDA-capable GPU with PyTorch ≥ 2.9.1

Model Weights

Download the following files from the CMNET2 v1.0.0 Release and place them in the correct directories under vscmnet2/:

File Destination Download
DINOv2FeatureV6_LocalAtten_s2_154000.pth vscmnet2/weights/ download
dinov2_vits14_pretrain.pth vscmnet2/models/checkpoints/ download
resnet18-5c106cde.pth vscmnet2/models/checkpoints/ download
resnet50-19c8e357.pth vscmnet2/models/checkpoints/ download

Note: The DINOv2 source code (facebookresearch_dinov2_main/) is already included in this repository under vscmnet2/models/.

Install spatial_correlation_sampler

In the Release 1.0.0 there is an archive with a compiled version (PyTorch 2.10 + CUDA 13.0) of Pytorch-Correlation-extension, required by vscmnet2 for temporal alignment during encoding:

pip install spatial_correlation_sampler-0.5.0-cp312-cp312-win_amd64.whl

The wheel is pre-built for Python 3.12 / PyTorch 2.10+cu130 / Windows x64. It will only work with that exact combination. For other environments it will be necessary build the wheel from sources.

4. DiT model (optional — for vs_cmnet2dit)

The DiT path uses a DiT Engine Server running separately. Start the server pointing to a Nunchaku SVD quant model, then connect via:

clip = vs_cmnet2dit(clip, dit_engine_params={"host": "127.0.0.1", "port": 8765})

Usage

Basic colorization with external reference clip

from vscmnet2 import vs_cmnet2
clip = vs_cmnet2(clip, clip_ref=ref_clip, method=6)

Reference frames from a directory

Reference frames are read from a folder. Files must be named ref_NNNNNN.png (e.g. ref_000897.png).

clip = vs_cmnet2(clip, sc_framedir="/path/to/refs", method=4)

Custom render speed and retry

clip = vs_cmnet2(
    clip,
    clip_ref=ref_clip,
    method=0,
    render_speed="Slow",
    render_vivid=True,
    max_memory_frames=40,
    retry_threshold=0.35,
    retry_model=1,            # Dit model
)

DiT-based colorization

from vscmnet2 import vs_cmnet2dit

clip = vs_cmnet2dit(
    clip,
    dit_engine_params={
        "host": "127.0.0.1",
        "port": 8765,
    },
    max_memory_frames=20,
)

Re-color a range of frames

Re-colorizes only the frames between two reference frames, leaving the rest unchanged. Useful for fixing specific sections of an already colored clip.

from vscmnet2 import vs_cmnet2_recolor

clip = vs_cmnet2_recolor(
    clip,
    ref_framedir="/path/to/refs",
    ref_start_path="/path/to/refs/ref_000100.png",
    ref_end_path="/path/to/refs/ref_000200.png",
    method=4,
    max_memory_frames=20,
)

Read external video

from vscmnet2 import vs_read_video

clip = vs_read_video("/path/to/video.mkv")

Key Parameters

vs_cmnet2_recolor

Parameter Type Default Description
clip VideoNode Already colorized clip to re-color
method int 4 3=ref same as video, 4=ref different from video
render_speed str "auto" auto, fast, medium, slow, slower
render_vivid bool False +15% saturation boost
ref_framedir str Directory with reference frames (format: ref_NNNNNN.png)
ref_start_path str First reference frame to re-color from
ref_end_path str Last reference frame to re-color to
max_memory_frames int 0 (→20) Permanent-memory window size (even, 10–500)
retry_threshold float 0.0 Retry trigger (0.0=disabled; suggest 0.20–0.35)
retry_model int 1 1=DiT fp4, 2=DiT int4
torch_dir str model dir Torch hub cache location

vs_cmnet2

Parameter Type Default Description
clip VideoNode B&W input clip
clip_ref VideoNode None Reference clip (method 5,6)
method int 0 Reference frame generation: 3-4=external, 5-6=clipRef
render_speed str "auto" auto, fast, medium, slow, slower
render_vivid bool False +15% saturation boost
encode_mode int 0 0=remote (recommended), 1=local
max_memory_frames int 0 (→20) Permanent-memory window size (even, 10–500)
ref_mode int 1 0=direct folder, 1=VS clips
retry_threshold float 0.0 Retry trigger (0.0=disabled; suggest 0.20–0.35)
retry_model int 0 0=DeOldify+DDColor, 1=DiT fp4, 2=DiT int4
torch_dir str model dir Torch hub cache location

vs_cmnet2dit

Parameter Type Default Description
clip VideoNode B&W input clip
sc_thresh float 0.035 Scene-detect threshold
sc_min_int int 25 Min frame distance between scene changes
max_memory_frames int 0 (→20) Permanent-memory window (even, pair-wise)
dit_engine_params dict None DiT Engine Server connection

Model Architecture

CMNET2 (Colorization Memory Network v2) is an exemplar-based video colorization model. It maintains a sliding permanent memory of reference frames and propagates color through a space-time memory network. The architecture uses:

  • DINOv2 ViT-S/14 as the key encoder backbone
  • ResNet-18 and ResNet-50 as value encoders
  • LocalGatedPropagation for attention-based memory readout
  • CBAM (Convolutional Block Attention Module) for feature refinement
  • KeyValueMemoryStore with top-k readout for efficient retrieval

The DiT variant offloads reference-frame colorization to an external DiT (Diffusion Transformer) model running in a separate RPC server process.


Project Structure

vscmnet2/
├── __init__.py          # Main VapourSynth wrapper (vs_cmnet2, vs_cmnet2dit, vs_merge, vs_read_video)
├── cmnet2_utils.py      # Format conversion, luma protection, video I/O
├── colormnet2/          # CMNET2 core (colorization engine)
│   ├── __init__.py      # vs_colormnet2_local / vs_colormnet2_remote
│   ├── colormnet2_render.py   # Render class (ColorMNetRender2)
│   ├── colormnet2_server.py   # XML-RPC server
│   ├── colormnet2_client.py   # XML-RPC client
│   ├── model/           # Neural network modules
│   │   ├── network.py   # ColorMNet (top-level nn.Module)
│   │   ├── resnet.py    # ResNet backbone with DINOv2 key encoder
│   │   ├── modules.py   # Key/value encoders, decoder, memory read
│   │   ├── attention.py # LocalGatedPropagation
│   │   └── ...
│   └── inference/       # Inference core, memory manager
├── vsslib/              # Shared VapourSynth utility library
│   ├── vsmodels.py      # Model dispatchers (vs_colormnet2, vs_colormnet2dit)
│   ├── vsimage_engine.py   # DiT engine / DeOldify+DDColor fallback
│   ├── vsplugins.py     # VapourSynth plugin loaders
│   ├── vsfilters.py     # VapourSynth filter functions (merge, tweak, etc.)
│   ├── vsscdect.py      # Scene-change detection
│   ├── vsscdetect_edge.py  # Edge-based scene detection
│   └── ...
├── weights/             # CMNET2 model weights
├── models/
│   ├── checkpoints/     # Backbone weights (DINOv2, ResNet)
│   └── facebookresearch_dinov2_main/  # DINOv2 source
└── plugins/             # VapourSynth .dll plugins (from plugins_win.zip)

Credits


License

MIT

About

VapourSynth filter for exemplar-based video colorization using CMNET2.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors