Juntong Fang* · Zequn Chen* · Weiqi Zhang* · Donglin Di · Xuancheng Zhang · Chengmin Yang · Yu-Shen Liu†
- [2026.03] Refactored inference scripts and pretrained weights are released.
- [2026.02] MoRe has been accepted by CVPR 2026!
MoRe is a feedforward 4D reconstruction transformer designed to efficiently recover dynamic 3D scenes from monocular videos.
- Motion-Structure Disentanglement: Employs an attention-forcing strategy to separate dynamic motion from static structure.
- Grouped Causal Attention: Captures temporal dependencies and adapts to varying token lengths for coherent geometry.
Clone the repository and create an anaconda environment using
# Clone the repository
git clone https://github.com/HellexF/MoRe
cd MoRe
# Create and activate environment
conda create -n more python=3.10 -y
conda activate more
# Install PyTorch and CUDA toolkit
conda install pytorch=2.9.0 torchvision=0.24.0 cudatoolkit=11.8 -c pytorch
conda install cudatoolkit-dev=11.8 -c conda-forge
# Install remaining dependencies
pip install -r requirements.txtRequired Extension : We use MagiAttention for implementing grouped causal attention. Please follow their installation guide to enable stream inference.
We provide the pretrained full attention and stream models. Please download the pretrained models from Google Drive and place them in the ./pretrained directory:
python inference.py \
--config_path training/config/omniworld_full.yaml \
--ckpt_path pretrained/more_full.pt \
--image_path ./data/example_video \
--output_dir ./results/full_res \
--conf_thres 50.0 \
--predict_motionWe offer the training config for both full attention and stream training on Omniworld-Game dataset. Please refer to the Omniworld for downloading and place it in the './dataset' directory. To train full attention version, simply run
torchrun --nproc_per_node=$GPU_NUM training/launch.py --config omniworld_fullSimilarly, to train the stream version, run
torchrun --nproc_per_node=$GPU_NUM training/launch.py --config omniworld_streamRun the following scripts to evaluate benchmarks for camera poses and video depth:
# Camera Pose Evaluation
bash eval/relpose/run.sh
# Video Depth Evaluation
bash eval/video_depth/run.shThis project is built upon VGGT, MagiAttention. We thank all the authors for their great repos.
If you find our code or paper useful, please consider citing
@inproceedings{fang2026moremotionawarefeedforward4d,
title={MoRe: Motion-aware Feed-forward 4D Reconstruction Transformer},
author={Juntong Fang and Zequn Chen and Weiqi Zhang and Donglin Di and Xuancheng Zhang and Chengmin Yang and Yu-Shen Liu},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2026}
}
