MoRe: Motion-aware Feed-forward 4D Reconstruction Transformer

Junton Fang1*, Zequn Chen2*, Weiqi Zhang1*, Donglin Di2, Xuancheng Zhang1,2, Chenmin Yang2, Yu-Shen Liu1†
1Tsinghua University, 2Li Auto
MoRe Architecture

MoRe explicitly disentangles dynamic motion from static scene structure using Attention-Forcing and Grouped Causal Attention.

Abstract

Reconstructing dynamic 4D scenes remains challenging due to moving objects that corrupt camera pose estimation. We propose MoRe, a feed-forward 4D reconstruction network that recovers dynamic 3D scenes efficiently.

Our core innovation is the Attention-Forcing strategy, which guides the model to decouple motion cues from background geometry. Additionally, we introduce Grouped Causal Attention for streamable input processing, ensuring temporal consistency across reconstructed frames.

Method Overview

MoRe Architecture
  • Joint Estimation: Simultaneously estimates camera poses, depth, and motion masks.
  • Motion Disentanglement: Decouples dynamic objects from static structures via Attention-Forcing.
  • Efficiency: A feed-forward Transformer architecture designed for real-time 4D reconstruction.

BibTeX

@misc{fang2026moremotionawarefeedforward4d,
      title={MoRe: Motion-aware Feed-forward 4D Reconstruction Transformer}, 
      author={Juntong Fang and Zequn Chen and Weiqi Zhang and Donglin Di and Xuancheng Zhang and Chengmin Yang and Yu-Shen Liu},
      booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
      year={2026}
}