LipSyncNet (LSN) — Capstone

PyTorch re-implementation of LipSyncNet (3D-CNN + EfficientNet-B0 + temporal backbone + CTC). This repo trains and evaluates three backend variants on GRID and reports cross-dataset transfer to LRS2.

Source: refactored from notebooks/legacy/LSN_TRAINING_EVAL.ipynb — see docs/superpowers/specs/2026-04-29-lsn-notebook-to-codebase-design.md for the full design.

Results

(Generated by python scripts/report.py. Committed PNGs / CSVs live in results/.)

results/learning_curve_run_paper_v1.png — paper-faithful run
results/learning_curve_run_identity_v1.png — no-temporal-backbone ablation
results/learning_curve_run_transformer_v1.png — per-stream Transformer encoder
results/learning_curves_comparison.png — all three overlaid (val loss)
results/results_table_grid.csv — Table 5 reproduction (CER, WER, word-acc, sentence-acc per model + paper baselines)
results/qualitative_examples_grid.csv — Table 6 reproduction
results/results_table_lrs2.csv — cross-dataset transfer

Setup — local

Requires Python ≥3.10, PyTorch ≥2.1, CUDA optional (CPU works for inference).

git clone <repo-url>
cd <repo>
pip install -e .[dev]

For preprocessing only, also install:

pip install -e .[preprocessing]
# then download the dlib landmark model:
wget http://dlib.net/files/shape_predictor_68_face_landmarks.dat.bz2
bzip2 -dk shape_predictor_68_face_landmarks.dat.bz2

The .npz clips (see docs/data-format.md for format) are produced by scripts/preprocess.py (requires the [preprocessing] extras above).

Setup — Colab

Open a fresh Colab notebook with a GPU runtime. Run these setup cells:

# 1. Clone & install
!git clone <repo-url> lsn
%cd lsn
!pip install -e .

# 2. Mount Drive (data lives here)
from google.colab import drive
drive.mount('/content/drive')

# 3. HF token (only if you'll push checkpoints to Hub)
from google.colab import userdata
import os
os.environ['HF_TOKEN'] = userdata.get('HF_TOKEN')

Then run the CLI:

!python scripts/train.py --config configs/identity.yaml \
    --data-dir /content/drive/MyDrive/LSN_Data/grid_processed_new \
    --hf-repo ranro1/lipsyncnet-checkpoints

Setup — Kaggle

Add the GRID dataset to your notebook (Add data → search by name). Add HF_TOKEN via Add-ons → Secrets → enable. In the notebook:

!git clone <repo-url> /kaggle/working/lsn
%cd /kaggle/working/lsn
!pip install -e .

from kaggle_secrets import UserSecretsClient
import os
os.environ['HF_TOKEN'] = UserSecretsClient().get_secret('HF_TOKEN')

!python scripts/train.py --config configs/identity.yaml \
    --data-dir /kaggle/input/<your-grid-dataset>/grid_processed_new \
    --hf-repo ranro1/lipsyncnet-checkpoints

Note: Kaggle commits time out at ~9h. The codebase resumes from the most recent last_checkpoint.pt automatically when --hf-repo is set. Disk requirement: ~5 GB working space for the largest checkpoint.

Usage

Preprocess GRID videos → .npz clips

python scripts/preprocess.py \
    --video-dir /path/to/grid_videos \
    --align-dir /path/to/grid_align \
    --output-dir /path/to/grid_processed_new \
    --landmark-path shape_predictor_68_face_landmarks.dat \
    --speakers s1 s2 s3 s4 s5

Both --video-dir and --align-dir must contain one subdirectory per speaker (e.g. s1/, s2/, ...). Output .npz files are written under --output-dir in the same layout, which is then passed directly to --data-dir for training.

Train one experiment

python scripts/train.py --config configs/identity.yaml --data-dir <data-dir>

Three committed configs reproduce the three trained models: configs/identity.yaml, configs/paper.yaml, configs/transformer.yaml.

Inference (writes JSON predictions)

python scripts/infer.py --config configs/identity.yaml \
    --weights results/checkpoints/run_identity_v1/best_model.pt \
    --dataset grid --data-dir <test-data-dir>

For LRS2 cross-dataset transfer:

python scripts/infer.py --config configs/identity.yaml \
    --weights results/checkpoints/run_identity_v1/best_model.pt \
    --dataset lrs2 --data-dir <lrs2-test-dir>

Report (plots + CSV tables)

python scripts/report.py --predictions-dir results/predictions

Project layout

Path	Purpose
`src/lsn/preprocessing/`	Mouth-ROI extraction, normalization, .npz writer
`src/lsn/models/`	3D-CNN, EfficientNet, temporal backends, top-level models
`src/lsn/data/`	Datasets, splits, vocab, LRS2 normalize
`src/lsn/training/`	Loop, checkpoint (HF-gated), runner
`src/lsn/evaluation/`	Decoders, metrics, inference, report
`scripts/`	`preprocess.py`, `train.py`, `infer.py`, `report.py` — CLIs
`configs/`	Three YAML experiments — the reproducibility artifact
`tests/`	Smoke tests (model shapes, data contract, config roundtrip)
`results/`	Canonical PNGs + CSVs (committed); checkpoints + predictions are gitignored
`docs/`	`data-format.md`, `future-work.md`, design spec
`notebooks/legacy/`	Original `LSN_TRAINING_EVAL.ipynb` — preserved for lineage

Tests

pytest -v

Backward-compat canary against existing checkpoints:

$env:LSN_CKPT_DIR = "/path/to/checkpoint-dir"
pytest tests/test_checkpoint_compat.py -v

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LipSyncNet (LSN) — Capstone

Results

Setup — local

Setup — Colab

Setup — Kaggle

Usage

Preprocess GRID videos → .npz clips

Train one experiment

Inference (writes JSON predictions)

Report (plots + CSV tables)

Project layout

Tests

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
configs		configs
notebooks/legacy		notebooks/legacy
results		results
scripts		scripts
src/lsn		src/lsn
tests		tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

LipSyncNet (LSN) — Capstone

Results

Setup — local

Setup — Colab

Setup — Kaggle

Usage

Preprocess GRID videos → .npz clips

Train one experiment

Inference (writes JSON predictions)

Report (plots + CSV tables)

Project layout

Tests

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages