MoRe: Motion-aware Feed-forward 4D Reconstruction Transformer

Junton Fang1*, Zequn Chen2*, Weiqi Zhang1*, Donglin Di2, Xuancheng Zhang1,2, Chenmin Yang2, Yu-Shen Liu1†
1Tsinghua University, 2Li Auto

MoRe explicitly disentangles dynamic motion from static scene structure using Attention-Forcing and Grouped Causal Attention.

Abstract

Reconstructing dynamic 4D scenes remains challenging due to moving objects that corrupt camera pose estimation. We propose MoRe, a feed-forward 4D reconstruction network that recovers dynamic 3D scenes efficiently.

Our core innovation is the Attention-Forcing strategy, which guides the model to decouple motion cues from background geometry. Additionally, we introduce Grouped Causal Attention for streamable input processing, ensuring temporal consistency across reconstructed frames.

Method Overview

MoRe Architecture
  • Joint Estimation: Simultaneously estimates camera poses, depth, and motion masks.
  • Motion Disentanglement: Decouples dynamic objects from static structures via Attention-Forcing.
  • Efficiency: A feed-forward Transformer architecture designed for real-time 4D reconstruction.

BibTeX

@article{fang2025more,
  title={MoRe: Motion-aware Feed-forward 4D Reconstruction Transformer},
  author={Fang, Junton and Chen, Zequn and Zhang, Weiqi and Di, Donglin and Zhang, Xuancheng and Yang, Chengmin and Liu, Yu-Shen},
  journal={arXiv preprint},
  year={2025}
}