MoRe: Motion-aware Feed-forward 4D Reconstruction Transformer

Junton Fang^1*, Zequn Chen^2*, Weiqi Zhang^1*, Donglin Di², Xuancheng Zhang^1,2, Chenmin Yang², Yu-Shen Liu^1†

¹Tsinghua University, ²Li Auto

Paper Code Video

MoRe explicitly disentangles dynamic motion from static scene structure using Attention-Forcing and Grouped Causal Attention.

Abstract

Reconstructing dynamic 4D scenes remains challenging due to moving objects that corrupt camera pose estimation. We propose MoRe, a feed-forward 4D reconstruction network that recovers dynamic 3D scenes efficiently.

Our core innovation is the Attention-Forcing strategy, which guides the model to decouple motion cues from background geometry. Additionally, we introduce Grouped Causal Attention for streamable input processing, ensuring temporal consistency across reconstructed frames.

Method Overview

Joint Estimation: Simultaneously estimates camera poses, depth, and motion masks.
Motion Disentanglement: Decouples dynamic objects from static structures via Attention-Forcing.
Efficiency: A feed-forward Transformer architecture designed for real-time 4D reconstruction.

BibTeX

@article{fang2025more,
  title={MoRe: Motion-aware Feed-forward 4D Reconstruction Transformer},
  author={Fang, Junton and Chen, Zequn and Zhang, Weiqi and Di, Donglin and Zhang, Xuancheng and Yang, Chengmin and Liu, Yu-Shen},
  journal={arXiv preprint},
  year={2025}
}