TY - JOUR
T1 - Learn to Triangulate Scene Coordinates for Visual Localization
AU - Guo, Xiang
AU - Chen, Tianrui
AU - Li, Bo
AU - Liu, Qi
AU - Jia, Huarong
AU - Dai, Yuchao
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2024/4/1
Y1 - 2024/4/1
N2 - Visual localization plays a critical role in robotics. The scene coordinate regression-based localization methods have achieved state-of-the-art performance. However, current methods still have a gap between the regressed scene coordinates and the ground truth scene coordinates, which hinders the improvement of localization accuracy. These methods generally use structure-from-motion (SfM) or depth sensors to generate proxy scene coordinate supervision labels for training, but these proxy labels are contaminated with errors and noises, which are sub-optimal for training. To resolve this issue, we introduce a simple yet effective triangulation constraint, which could be easily incorporated into any scene coordinate regression-based framework. Instead of directly regressing the scene coordinates, our constraint reinforces the network, which learns to triangulate the ground truth scene coordinates without any proxy scene coordinate labels for supervision. Extensive experiments across multiple public datasets show that our triangulation constraint establishes significant improvement and even achieves better results without proxy labels for supervision. Furthermore, our method could recover denser and more complete 3D models compared with the SfM and other localization methods.
AB - Visual localization plays a critical role in robotics. The scene coordinate regression-based localization methods have achieved state-of-the-art performance. However, current methods still have a gap between the regressed scene coordinates and the ground truth scene coordinates, which hinders the improvement of localization accuracy. These methods generally use structure-from-motion (SfM) or depth sensors to generate proxy scene coordinate supervision labels for training, but these proxy labels are contaminated with errors and noises, which are sub-optimal for training. To resolve this issue, we introduce a simple yet effective triangulation constraint, which could be easily incorporated into any scene coordinate regression-based framework. Instead of directly regressing the scene coordinates, our constraint reinforces the network, which learns to triangulate the ground truth scene coordinates without any proxy scene coordinate labels for supervision. Extensive experiments across multiple public datasets show that our triangulation constraint establishes significant improvement and even achieves better results without proxy labels for supervision. Furthermore, our method could recover denser and more complete 3D models compared with the SfM and other localization methods.
KW - Mapping
KW - simultaneous localization and mapping (SLAM)
KW - structure-from-motion (SfM)
KW - visual localization
UR - http://www.scopus.com/inward/record.url?scp=85184804441&partnerID=8YFLogxK
U2 - 10.1109/LRA.2024.3362637
DO - 10.1109/LRA.2024.3362637
M3 - 文章
AN - SCOPUS:85184804441
SN - 2377-3766
VL - 9
SP - 3339
EP - 3346
JO - IEEE Robotics and Automation Letters
JF - IEEE Robotics and Automation Letters
IS - 4
ER -