Deep Non-Rigid Structure-From-Motion: A Sequence-to-Sequence Translation Perspective

Hui Deng, Tong Zhang, Yuchao Dai, Jiawei Shi, Yiran Zhong, Hongdong Li

科研成果: 期刊稿件文章同行评审

摘要

Directly regressing the non-rigid shape and camera pose from the individual 2D frame is ill-suited to the Non-Rigid Structure-from-Motion (NRSfM) problem. This frame-by-frame 3D reconstruction pipeline overlooks the inherent spatial-temporal nature of NRSfM, i.e., reconstructing the 3D sequence from the input 2D sequence. In this paper, we propose to solve deep sparse NRSfM from a sequence-to-sequence translation perspective, where the input 2D keypoints sequence is taken as a whole to reconstruct the corresponding 3D keypoints sequence in a self-supervised manner. First, we apply a shape-motion predictor on the input sequence to obtain an initial sequence of shapes and corresponding motions. Then, we propose the Context Layer, which enables the deep learning framework to effectively impose overall constraints on sequences based on the structural characteristics of non-rigid sequences. The Context Layer constructs modules for imposing the self-expressiveness regularity on non-rigid sequences with multi-head attention (MHA) as the core, together with the use of temporal encoding, both of which act simultaneously to constitute constraints on non-rigid sequences in the deep framework. Experimental results across different datasets such as Human3.6M, CMU Mocap, and InterHand prove the superiority of our framework. The code will be made publicly available.

源语言英语
页(从-至)10814-10828
页数15
期刊IEEE Transactions on Pattern Analysis and Machine Intelligence
46
12
DOI
出版状态已出版 - 2024

指纹

探究 'Deep Non-Rigid Structure-From-Motion: A Sequence-to-Sequence Translation Perspective' 的科研主题。它们共同构成独一无二的指纹。

引用此