FlowCLIP: ActionCLIP with Learnable Optical Flow

PyTorch implementation of FlowCLIP, an extension of ActionCLIP: A New Paradigm for Video Action Recognition [arXiv] with a learnable optical-flow modality for fine-grained human-action recognition.

About this Repository

FlowCLIP builds on the original ActionCLIP codebase and adds optical flow as a complementary temporal modality alongside RGB. The repository contains the modified architecture, training/testing pipeline, and an HMDB-51 demonstration workflow (30-epoch fine-tuning of ViT-B/16 on a Kinetics-400 backbone with Transformer fusion, evaluated at 256 px resolution and reporting Top-1/Top-5 accuracy on HMDB-51). A Colab notebook is also provided for zero-shot inference on a single video using the pretrained K400 backbone.

Updates

2026.05: Added optical flow support as a complementary temporal modality (FlowCLIP). See the Optical Flow section for details.
2022.01: Add the trained model download link of google driver.

Overview

Content

Prerequisites
Data Preparation
Optical Flow
Updates
Pretrained Models
- Kinetics-400
- Hmdb51 && UCF101
Testing
Training
Maintainer
Contributors
Citing_ActionClip
Acknowledgments

Prerequisites

The code is built with following libraries:

PyTorch >= 1.8
wandb
RandAugment
pprint
tqdm
dotmap
yaml
csv
OpenCV or RAFT (for optical flow extraction)

For video data pre-processing, you may need ffmpeg.

More detail information about libraries see INSTALL.md.

Data Preparation

We need to first extract videos into frames for fast reading. Please refer to TSN repo for the detailed guide of data pre-processing.

We have successfully trained on Kinetics, UCF101, HMDB51, Charades.

Optical Flow

FlowCLIP introduces optical flow as an additional temporal modality on top of the RGB stream used by ActionCLIP. The flow stream captures motion cues that complement appearance features, improving recognition on motion-intensive action classes.

Extracting Optical Flow

Use the provided helper script utils/extract_optical_flow.py to extract per-frame optical flow from the decoded RGB frames:

# TV-L1 (via OpenCV)
python utils/extract_optical_flow.py --method tvl1 --src /path/to/rgb_frames --dst /path/to/flow_frames

# RAFT
python utils/extract_optical_flow.py --method raft --src /path/to/rgb_frames --dst /path/to/flow_frames

Directory Structure

data/
  rgb_frames/<video_id>/img_00001.jpg ...
  flow_frames/<video_id>/flow_x_00001.jpg, flow_y_00001.jpg ...

Training with Optical Flow

Enable the flow stream in the YAML config:

data:
  use_optical_flow: True
  flow_prefix: flow_

Then launch training with the flow-enabled config:

# train with optical flow
bash scripts/run_train.sh ./configs/k400/k400_train_flow.yaml

Notes

Flow frames are stored as JPEG (x and y components in separate files).
RGB and flow features are combined via feature-level fusion before the temporal Transformer.
The flow stream improves accuracy on motion-intensive classes.

Pretrained Models

Training video models is computationally expensive. Here we provide some of the pretrained models. We provide a large set of trained models in the ActionCLIP MODEL_ZOO.md.

Kinetics-400

We experiment ActionCLIP with different backbones(we choose Transf as our final visual prompt since it obtains the best results) and input frames configurations on k400. Here is a list of pre-trained models that we provide (see Table 6 of the paper). *Note that we show the 8-frame ViT-B/32 training log file in ViT32_8F_K400.log.

model	n-frame	top1 Acc(single-crop)	top5 Acc(single-crop)	checkpoint
ViT-B/32	8	78.36%	94.25%	link pwd:b5ni
ViT-B/16	8	81.09%	95.49%	link pwd:hqtv
ViT-B/16	16	81.68%	95.87%	link pwd:dk4r
ViT-B/16	32	82.32%	96.20%	link pwd:35uu

HMDB51 && UCF101

On HMDB51 and UCF101 datasets, the accuracy(k400 pretrained) is reported under the accurate setting.

HMDB51

model	n-frame	top1 Acc(single-crop)	checkpoint
ViT-B/16	32	76.2%	link

UCF101

model	n-frame	top1 Acc(single-crop)	checkpoint
ViT-B/16	32	97.1%	link

Testing

To test the downloaded pretrained models on Kinetics or HMDB51 or UCF101, you can run scripts/run_test.sh. For example:

# test
bash scripts/run_test.sh  ./configs/k400/k400_test.yaml

Zero-shot

We provide several examples to do zero-shot validation on kinetics-400, UCF101 and HMDB51.

To do zero-shot validation on Kinetics from CLIP pretrained models, you can run:

# zero-shot
bash scripts/run_test.sh  ./configs/k400/k400_ft_zero_shot.yaml

To do zero-shot validation on UCF101 and HMDB51 from Kinetics pretrained models, you need first prepare the k400 pretrained model and then you can run:

# zero-shot
bash scripts/run_test.sh  ./configs/hmdb51/hmdb_ft_zero_shot.yaml

Training

We provided several examples to train ActionCLIP with this repo:

To train on Kinetics from CLIP pretrained models, you can run:

# train
bash scripts/run_train.sh  ./configs/k400/k400_train.yaml

To train on HMDB51 from Kinetics400 pretrained models, you can run:

# train
bash scripts/run_train.sh  ./configs/hmdb51/hmdb_train.yaml

To train on UCF101 from Kinetics400 pretrained models, you can run:

# train
bash scripts/run_train.sh  ./configs/ucf101/ucf_train.yaml

To train with optical flow (FlowCLIP) on Kinetics-400, you can run:

# train with optical flow
bash scripts/run_train.sh  ./configs/k400/k400_train_flow.yaml

More training details, you can find in configs/README.md

Maintainer

This FlowCLIP repository is maintained by Srikanth Baride and is derived from the original ActionCLIP codebase by Mengmeng Wang and Jiazheng Xing.

Contributors

ActionCLIP is written and maintained by Mengmeng Wang and Jiazheng Xing.

Citing ActionCLIP

If you find ActionClip useful in your research, please cite our paper.

Acknowledgments

Our code is based on CLIP and STM.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
clip		clip
configs		configs
datasets		datasets
lists		lists
logs		logs
modules		modules
scripts		scripts
utils		utils
.gitignore		.gitignore
ActionCLIP.jpg		ActionCLIP.jpg
ActionCLIP.png		ActionCLIP.png
ActionCLIP_modified.png		ActionCLIP_modified.png
Colab		Colab
INSTALL.md		INSTALL.md
LICENSE		LICENSE
MODEL_ZOO.md		MODEL_ZOO.md
README.md		README.md
environment.yml		environment.yml
setup.py		setup.py
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FlowCLIP: ActionCLIP with Learnable Optical Flow

About this Repository

Updates

Overview

Content

Prerequisites

Data Preparation

Optical Flow

Extracting Optical Flow

Directory Structure

Training with Optical Flow

Pretrained Models

Kinetics-400

HMDB51 && UCF101

HMDB51

UCF101

Testing

Zero-shot

Training

Maintainer

Contributors

Citing ActionCLIP

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FlowCLIP: ActionCLIP with Learnable Optical Flow

About this Repository

Updates

Overview

Content

Prerequisites

Data Preparation

Optical Flow

Extracting Optical Flow

Directory Structure

Training with Optical Flow

Pretrained Models

Kinetics-400

HMDB51 && UCF101

HMDB51

UCF101

Testing

Zero-shot

Training

Maintainer

Contributors

Citing ActionCLIP

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages