Deep learning-based optical flow model for estimating dense motion between video frames.
Status: Early development. The architecture, training pipeline, and evaluation setup are still evolving.
flowww is a lightweight optical flow network designed to estimate dense pixel motion between two consecutive video frames.
The project combines:
- an image encoder for hierarchical feature extraction,
- a correlation / cost-volume module for matching features,
- a ConvGRU-based iterative refinement loop,
- a multi-scale decoder for flow prediction at multiple resolutions,
- training losses including EPE, smoothness, gradient consistency, and photometric warping loss.
The current model has approximately 4.25M parameters.
For the most detailed architecture walkthrough, refer to Notebooks/optical_flow.ipynb.
- Dense optical flow estimation for frame pairs
- RAFT-inspired iterative refinement
- Cost-volume construction over local displacements
- ConvGRU-based recurrent update block
- Multi-scale supervision
- Photometric loss with warping
- Edge-aware smoothness regularization
- Training and validation utilities
- Visualization helpers for flow maps, histograms, and inference outputs
At a high level, the model follows this pipeline:
- Encoder extracts multi-scale features from both input frames.
- Cost Volume computes local feature correlations between the two feature maps.
- Cost Volume Encoder compresses the correlation tensor.
- ConvGRU iteratively refines a coarse flow estimate.
- Decoder upsamples the hidden state and predicts flow at multiple scales.
- Residual Blocks for feature refinement
- Encoder with early downsampling and skip connections
- CostVolume for local matching
- ConvGRUCell for iterative updates
- FlowDecoder for multi-scale flow outputs
- AttentionGate for skip/context fusion
- Warping utilities for photometric supervision
flowww/
├── configs/ # Configuration files
├── Notebooks/ # Notebook for architecture and experimentation
├── scripts/ # Training entry points
├── src/ # Dataset, model, loss, and visualization code
├── models/ # Saved model checkpoints
└── requirements.txt
Install the Python dependencies:
pip install -r requirements.txtThe current training setup uses a triplet-based dataset format:
img1img2flowSamples are resized to a fixed resolution, and the flow fields are scaled accordingly.
Training logic is implemented in scripts/train.py and prototyped in Notebooks/optical_flow.ipynb.
The pipeline includes:
- mixed precision training,
- AdamW optimization,
- OneCycleLR scheduling,
- gradient clipping,
- multi-scale loss,
- photometric loss on finer scales.
Average Endpoint Error (EPE): 1.33 on inference samples
This visualization summarizes model predictions, error signals, and flow characteristics in a single view.
The quiver plot represents flow vectors (direction + magnitude) at sampled points.
- U (horizontal displacement)
- V (vertical displacement)
These maps visualize the raw flow components learned by the model.
- Smooth regions → consistent motion
- Sharp transitions → motion boundaries / object edges
- Noise/artifacts → areas for further model improvement
- This project is still under active development.
- Architecture and training settings may change as the model is refined.
- The notebook is the best reference for the latest implementation details.
This project is licensed under the MIT License — see the LICENSE file for details.


