VLAForge (CVPR 2026)

Official PyTorch implementation of "Unleashing Vision-Language Semantics for Deepfake Video Detection".

Overview

Recent Deepfake Video Detection (DFD) studies have demonstrated that pre-trained Vision-Language Models (VLMs) such as CLIP exhibit strong generalization capabilities in detecting artifacts across different identities. However, existing approaches focus on leveraging visual features only, overlooking their most distinctive strength — the rich vision-language semantics embedded in the latent space. We propose VLAForge, a novel DFD framework that unleashes the potential of such cross-modal semantics to enhance model's discriminability in deepfake detection. This work i) enhances the visual perception of VLM through a ForgePerceiver, which acts as an independent learner to capture diverse, subtle forgery cues both granularly and holistically, while preserving the pretrained Vision–Language Alignment (VLA) knowledge, and ii) provides a complementary discriminative cue — Identity-Aware VLA score, derived by coupling cross-modal semantics with the forgery cues learned by ForgePerceiver. Notably, the VLA score is augmented by an identity prior-informed text prompting to capture authenticity cues tailored to each identity, thereby enabling more discriminative cross-modal semantics. Comprehensive experiments on video DFD benchmarks, including classical face-swapping forgeries and recent full-face generation forgeries, demonstrate that our VLAForge substantially outperforms state-of-the-art methods at both frame and video levels.

Setup

Run

conda env create -f environment.yml

to create the virtual enviroment.

Device

Single NVIDIA GeForce RTX 3090

Prepare Your Data

Step 1. Download the Deepfake Detection Datasets

FaceForensics++, CDF-v1, CDF-v2, Deepfake Detection Challenge, DeepfakeDetection
VQGAN, SiT-XL/2, DiT, PixArt are from DF40 (Celeb-DF).

Step 2. The JSON files are provided in JSONs.

Step 3. Download the Pre-train Models on [Google Drive].

Run VLAForge

Quick Start

Set test_dataset to the name of the test dataset in test.ymal. Then, run

bash test.sh

Training

Set train_dataset to the name of the test dataset in train.ymal. Then, train your own weights by runing

bash train.sh

Citation

If you find the implementation useful, we would appreciate your acknowledgement via citing our VLAForge paper:

@inproceedings{zhu2026dfd,
  title={Unleashing Vision-Language Semantics for Deepfake Video Detection},
  author={Jiawen Zhu, Yunqi Miao, Xueyi Zhang, Jiankang Deng, Guansong Pang},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
config		config
dataset		dataset
img		img
model		model
open_clip		open_clip
trainer		trainer
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
logger.py		logger.py
loss.py		loss.py
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VLAForge (CVPR 2026)

Overview

Setup

Device

Prepare Your Data

Step 1. Download the Deepfake Detection Datasets

Step 2. The JSON files are provided in JSONs.

Step 3. Download the Pre-train Models on [Google Drive].

Run VLAForge

Quick Start

Training

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VLAForge (CVPR 2026)

Overview

Setup

Device

Prepare Your Data

Step 1. Download the Deepfake Detection Datasets

Step 2. The JSON files are provided in JSONs.

Step 3. Download the Pre-train Models on [Google Drive].

Run VLAForge

Quick Start

Training

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages