PyTorch re-implementation of LipSyncNet (3D-CNN + EfficientNet-B0 + temporal backbone + CTC). This repo trains and evaluates three backend variants on GRID and reports cross-dataset transfer to LRS2.
Source: refactored from
notebooks/legacy/LSN_TRAINING_EVAL.ipynb— seedocs/superpowers/specs/2026-04-29-lsn-notebook-to-codebase-design.mdfor the full design.
(Generated by python scripts/report.py. Committed PNGs / CSVs live in results/.)
results/learning_curve_run_paper_v1.png— paper-faithful runresults/learning_curve_run_identity_v1.png— no-temporal-backbone ablationresults/learning_curve_run_transformer_v1.png— per-stream Transformer encoderresults/learning_curves_comparison.png— all three overlaid (val loss)results/results_table_grid.csv— Table 5 reproduction (CER, WER, word-acc, sentence-acc per model + paper baselines)results/qualitative_examples_grid.csv— Table 6 reproductionresults/results_table_lrs2.csv— cross-dataset transfer
Requires Python ≥3.10, PyTorch ≥2.1, CUDA optional (CPU works for inference).
git clone <repo-url>
cd <repo>
pip install -e .[dev]For preprocessing only, also install:
pip install -e .[preprocessing]
# then download the dlib landmark model:
wget http://dlib.net/files/shape_predictor_68_face_landmarks.dat.bz2
bzip2 -dk shape_predictor_68_face_landmarks.dat.bz2The .npz clips (see docs/data-format.md for format) are produced by
scripts/preprocess.py (requires the [preprocessing] extras above).
Open a fresh Colab notebook with a GPU runtime. Run these setup cells:
# 1. Clone & install
!git clone <repo-url> lsn
%cd lsn
!pip install -e .
# 2. Mount Drive (data lives here)
from google.colab import drive
drive.mount('/content/drive')
# 3. HF token (only if you'll push checkpoints to Hub)
from google.colab import userdata
import os
os.environ['HF_TOKEN'] = userdata.get('HF_TOKEN')Then run the CLI:
!python scripts/train.py --config configs/identity.yaml \
--data-dir /content/drive/MyDrive/LSN_Data/grid_processed_new \
--hf-repo ranro1/lipsyncnet-checkpointsAdd the GRID dataset to your notebook (Add data → search by name).
Add HF_TOKEN via Add-ons → Secrets → enable. In the notebook:
!git clone <repo-url> /kaggle/working/lsn
%cd /kaggle/working/lsn
!pip install -e .
from kaggle_secrets import UserSecretsClient
import os
os.environ['HF_TOKEN'] = UserSecretsClient().get_secret('HF_TOKEN')
!python scripts/train.py --config configs/identity.yaml \
--data-dir /kaggle/input/<your-grid-dataset>/grid_processed_new \
--hf-repo ranro1/lipsyncnet-checkpointsNote: Kaggle commits time out at ~9h. The codebase resumes from the most recent
last_checkpoint.ptautomatically when--hf-repois set. Disk requirement: ~5 GB working space for the largest checkpoint.
python scripts/preprocess.py \
--video-dir /path/to/grid_videos \
--align-dir /path/to/grid_align \
--output-dir /path/to/grid_processed_new \
--landmark-path shape_predictor_68_face_landmarks.dat \
--speakers s1 s2 s3 s4 s5Both --video-dir and --align-dir must contain one subdirectory per speaker
(e.g. s1/, s2/, ...). Output .npz files are written under --output-dir
in the same layout, which is then passed directly to --data-dir for training.
python scripts/train.py --config configs/identity.yaml --data-dir <data-dir>Three committed configs reproduce the three trained models:
configs/identity.yaml, configs/paper.yaml, configs/transformer.yaml.
python scripts/infer.py --config configs/identity.yaml \
--weights results/checkpoints/run_identity_v1/best_model.pt \
--dataset grid --data-dir <test-data-dir>For LRS2 cross-dataset transfer:
python scripts/infer.py --config configs/identity.yaml \
--weights results/checkpoints/run_identity_v1/best_model.pt \
--dataset lrs2 --data-dir <lrs2-test-dir>python scripts/report.py --predictions-dir results/predictions| Path | Purpose |
|---|---|
src/lsn/preprocessing/ |
Mouth-ROI extraction, normalization, .npz writer |
src/lsn/models/ |
3D-CNN, EfficientNet, temporal backends, top-level models |
src/lsn/data/ |
Datasets, splits, vocab, LRS2 normalize |
src/lsn/training/ |
Loop, checkpoint (HF-gated), runner |
src/lsn/evaluation/ |
Decoders, metrics, inference, report |
scripts/ |
preprocess.py, train.py, infer.py, report.py — CLIs |
configs/ |
Three YAML experiments — the reproducibility artifact |
tests/ |
Smoke tests (model shapes, data contract, config roundtrip) |
results/ |
Canonical PNGs + CSVs (committed); checkpoints + predictions are gitignored |
docs/ |
data-format.md, future-work.md, design spec |
notebooks/legacy/ |
Original LSN_TRAINING_EVAL.ipynb — preserved for lineage |
pytest -vBackward-compat canary against existing checkpoints:
$env:LSN_CKPT_DIR = "/path/to/checkpoint-dir"
pytest tests/test_checkpoint_compat.py -v