Skip to content

torrwill/team9capstone

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LipSyncNet (LSN) — Capstone

PyTorch re-implementation of LipSyncNet (3D-CNN + EfficientNet-B0 + temporal backbone + CTC). This repo trains and evaluates three backend variants on GRID and reports cross-dataset transfer to LRS2.

Source: refactored from notebooks/legacy/LSN_TRAINING_EVAL.ipynb — see docs/superpowers/specs/2026-04-29-lsn-notebook-to-codebase-design.md for the full design.


Results

(Generated by python scripts/report.py. Committed PNGs / CSVs live in results/.)

  • results/learning_curve_run_paper_v1.png — paper-faithful run
  • results/learning_curve_run_identity_v1.png — no-temporal-backbone ablation
  • results/learning_curve_run_transformer_v1.png — per-stream Transformer encoder
  • results/learning_curves_comparison.png — all three overlaid (val loss)
  • results/results_table_grid.csv — Table 5 reproduction (CER, WER, word-acc, sentence-acc per model + paper baselines)
  • results/qualitative_examples_grid.csv — Table 6 reproduction
  • results/results_table_lrs2.csv — cross-dataset transfer

Setup — local

Requires Python ≥3.10, PyTorch ≥2.1, CUDA optional (CPU works for inference).

git clone <repo-url>
cd <repo>
pip install -e .[dev]

For preprocessing only, also install:

pip install -e .[preprocessing]
# then download the dlib landmark model:
wget http://dlib.net/files/shape_predictor_68_face_landmarks.dat.bz2
bzip2 -dk shape_predictor_68_face_landmarks.dat.bz2

The .npz clips (see docs/data-format.md for format) are produced by scripts/preprocess.py (requires the [preprocessing] extras above).

Setup — Colab

Open a fresh Colab notebook with a GPU runtime. Run these setup cells:

# 1. Clone & install
!git clone <repo-url> lsn
%cd lsn
!pip install -e .

# 2. Mount Drive (data lives here)
from google.colab import drive
drive.mount('/content/drive')

# 3. HF token (only if you'll push checkpoints to Hub)
from google.colab import userdata
import os
os.environ['HF_TOKEN'] = userdata.get('HF_TOKEN')

Then run the CLI:

!python scripts/train.py --config configs/identity.yaml \
    --data-dir /content/drive/MyDrive/LSN_Data/grid_processed_new \
    --hf-repo ranro1/lipsyncnet-checkpoints

Setup — Kaggle

Add the GRID dataset to your notebook (Add data → search by name). Add HF_TOKEN via Add-ons → Secrets → enable. In the notebook:

!git clone <repo-url> /kaggle/working/lsn
%cd /kaggle/working/lsn
!pip install -e .

from kaggle_secrets import UserSecretsClient
import os
os.environ['HF_TOKEN'] = UserSecretsClient().get_secret('HF_TOKEN')

!python scripts/train.py --config configs/identity.yaml \
    --data-dir /kaggle/input/<your-grid-dataset>/grid_processed_new \
    --hf-repo ranro1/lipsyncnet-checkpoints

Note: Kaggle commits time out at ~9h. The codebase resumes from the most recent last_checkpoint.pt automatically when --hf-repo is set. Disk requirement: ~5 GB working space for the largest checkpoint.


Usage

Preprocess GRID videos → .npz clips

python scripts/preprocess.py \
    --video-dir /path/to/grid_videos \
    --align-dir /path/to/grid_align \
    --output-dir /path/to/grid_processed_new \
    --landmark-path shape_predictor_68_face_landmarks.dat \
    --speakers s1 s2 s3 s4 s5

Both --video-dir and --align-dir must contain one subdirectory per speaker (e.g. s1/, s2/, ...). Output .npz files are written under --output-dir in the same layout, which is then passed directly to --data-dir for training.

Train one experiment

python scripts/train.py --config configs/identity.yaml --data-dir <data-dir>

Three committed configs reproduce the three trained models: configs/identity.yaml, configs/paper.yaml, configs/transformer.yaml.

Inference (writes JSON predictions)

python scripts/infer.py --config configs/identity.yaml \
    --weights results/checkpoints/run_identity_v1/best_model.pt \
    --dataset grid --data-dir <test-data-dir>

For LRS2 cross-dataset transfer:

python scripts/infer.py --config configs/identity.yaml \
    --weights results/checkpoints/run_identity_v1/best_model.pt \
    --dataset lrs2 --data-dir <lrs2-test-dir>

Report (plots + CSV tables)

python scripts/report.py --predictions-dir results/predictions

Project layout

Path Purpose
src/lsn/preprocessing/ Mouth-ROI extraction, normalization, .npz writer
src/lsn/models/ 3D-CNN, EfficientNet, temporal backends, top-level models
src/lsn/data/ Datasets, splits, vocab, LRS2 normalize
src/lsn/training/ Loop, checkpoint (HF-gated), runner
src/lsn/evaluation/ Decoders, metrics, inference, report
scripts/ preprocess.py, train.py, infer.py, report.py — CLIs
configs/ Three YAML experiments — the reproducibility artifact
tests/ Smoke tests (model shapes, data contract, config roundtrip)
results/ Canonical PNGs + CSVs (committed); checkpoints + predictions are gitignored
docs/ data-format.md, future-work.md, design spec
notebooks/legacy/ Original LSN_TRAINING_EVAL.ipynb — preserved for lineage

Tests

pytest -v

Backward-compat canary against existing checkpoints:

$env:LSN_CKPT_DIR = "/path/to/checkpoint-dir"
pytest tests/test_checkpoint_compat.py -v

About

reproduction of lipsyncnet for cs668

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors