TY - JOUR
T1 - LoopRefine
T2 - Deep Camera Pose Estimation With Loop Consistency
AU - Wang, Zhiwei
AU - Deng, Hui
AU - Shi, Jiawei
AU - Xiang, Mochu
AU - Lu, Zhicheng
AU - Liu, Qi
AU - Dai, Yuchao
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2025
Y1 - 2025
N2 - Recently, pose estimation under sparse views (≤ 10) has witnessed significant advances with the development of deep learning. Most existing methods directly regress the absolute poses, demonstrating leading performance on benchmarks. However, directly regressing the scaled poses using deep neural networks is inherently ill-posed, resulting in overfitted models that perform poorly on diverse scenarios. In contrast, we resort to the well-posed solutions from traditional Structure-from-Motion (SfM) pipelines and propose LoopRefine, a diffusion model that assumes known camera intrinsics and estimates pairwise normalized camera relative poses and utilizes triplet coplanar constraints to align the scale of camera poses. Like traditional SfM methods, LoopRefine incrementally constructs camera triplets, and the scale ambiguities are resolved by gradually recovering the scale of poses and connecting the pose graph. To further improve the pose estimation accuracy during inference, we explore pose compatibility by randomly chaining the loop transformations on the pose graph and organizing iterative loop consistency-based optimization. Extensive experiments demonstrate the superiority of our method, and the generalization performance on both object-centered datasets and scene datasets also proves the effectiveness of integrated geometric constraints.
AB - Recently, pose estimation under sparse views (≤ 10) has witnessed significant advances with the development of deep learning. Most existing methods directly regress the absolute poses, demonstrating leading performance on benchmarks. However, directly regressing the scaled poses using deep neural networks is inherently ill-posed, resulting in overfitted models that perform poorly on diverse scenarios. In contrast, we resort to the well-posed solutions from traditional Structure-from-Motion (SfM) pipelines and propose LoopRefine, a diffusion model that assumes known camera intrinsics and estimates pairwise normalized camera relative poses and utilizes triplet coplanar constraints to align the scale of camera poses. Like traditional SfM methods, LoopRefine incrementally constructs camera triplets, and the scale ambiguities are resolved by gradually recovering the scale of poses and connecting the pose graph. To further improve the pose estimation accuracy during inference, we explore pose compatibility by randomly chaining the loop transformations on the pose graph and organizing iterative loop consistency-based optimization. Extensive experiments demonstrate the superiority of our method, and the generalization performance on both object-centered datasets and scene datasets also proves the effectiveness of integrated geometric constraints.
KW - Coplanar Constraints
KW - Diffusion Model
KW - Loop Consistency-based Optimization
KW - Pose estimation
KW - Sparse Views
UR - http://www.scopus.com/inward/record.url?scp=105008896355&partnerID=8YFLogxK
U2 - 10.1109/LRA.2025.3581044
DO - 10.1109/LRA.2025.3581044
M3 - 文章
AN - SCOPUS:105008896355
SN - 2377-3766
JO - IEEE Robotics and Automation Letters
JF - IEEE Robotics and Automation Letters
ER -