Learn to Triangulate Scene Coordinates for Visual Localization

Xiang Guo; Tianrui Chen; Bo Li; Qi Liu; Huarong Jia; Yuchao Dai

doi:10.1109/LRA.2024.3362637

Learn to Triangulate Scene Coordinates for Visual Localization

Xiang Guo, Tianrui Chen, Bo Li, Qi Liu, Huarong Jia, Yuchao Dai

电子信息学院

科研成果: 期刊稿件 › 文章 › 同行评审

1 引用（Scopus）

摘要

Visual localization plays a critical role in robotics. The scene coordinate regression-based localization methods have achieved state-of-the-art performance. However, current methods still have a gap between the regressed scene coordinates and the ground truth scene coordinates, which hinders the improvement of localization accuracy. These methods generally use structure-from-motion (SfM) or depth sensors to generate proxy scene coordinate supervision labels for training, but these proxy labels are contaminated with errors and noises, which are sub-optimal for training. To resolve this issue, we introduce a simple yet effective triangulation constraint, which could be easily incorporated into any scene coordinate regression-based framework. Instead of directly regressing the scene coordinates, our constraint reinforces the network, which learns to triangulate the ground truth scene coordinates without any proxy scene coordinate labels for supervision. Extensive experiments across multiple public datasets show that our triangulation constraint establishes significant improvement and even achieves better results without proxy labels for supervision. Furthermore, our method could recover denser and more complete 3D models compared with the SfM and other localization methods.

源语言	英语
页（从-至）	3339-3346
页数	8
期刊	IEEE Robotics and Automation Letters
卷	9
期	4
DOI	https://doi.org/10.1109/LRA.2024.3362637
出版状态	已出版 - 1 4月 2024

访问文件

10.1109/LRA.2024.3362637

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{d35982560f404a5e8b249af9238e7696,

title = "Learn to Triangulate Scene Coordinates for Visual Localization",

abstract = "Visual localization plays a critical role in robotics. The scene coordinate regression-based localization methods have achieved state-of-the-art performance. However, current methods still have a gap between the regressed scene coordinates and the ground truth scene coordinates, which hinders the improvement of localization accuracy. These methods generally use structure-from-motion (SfM) or depth sensors to generate proxy scene coordinate supervision labels for training, but these proxy labels are contaminated with errors and noises, which are sub-optimal for training. To resolve this issue, we introduce a simple yet effective triangulation constraint, which could be easily incorporated into any scene coordinate regression-based framework. Instead of directly regressing the scene coordinates, our constraint reinforces the network, which learns to triangulate the ground truth scene coordinates without any proxy scene coordinate labels for supervision. Extensive experiments across multiple public datasets show that our triangulation constraint establishes significant improvement and even achieves better results without proxy labels for supervision. Furthermore, our method could recover denser and more complete 3D models compared with the SfM and other localization methods.",

keywords = "Mapping, simultaneous localization and mapping (SLAM), structure-from-motion (SfM), visual localization",

author = "Xiang Guo and Tianrui Chen and Bo Li and Qi Liu and Huarong Jia and Yuchao Dai",

note = "Publisher Copyright: {\textcopyright} 2016 IEEE.",

year = "2024",

month = apr,

day = "1",

doi = "10.1109/LRA.2024.3362637",

language = "英语",

volume = "9",

pages = "3339--3346",

journal = "IEEE Robotics and Automation Letters",

issn = "2377-3766",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "4",

}

TY - JOUR

T1 - Learn to Triangulate Scene Coordinates for Visual Localization

AU - Guo, Xiang

AU - Chen, Tianrui

AU - Li, Bo

AU - Liu, Qi

AU - Jia, Huarong

AU - Dai, Yuchao

PY - 2024/4/1

Y1 - 2024/4/1

N2 - Visual localization plays a critical role in robotics. The scene coordinate regression-based localization methods have achieved state-of-the-art performance. However, current methods still have a gap between the regressed scene coordinates and the ground truth scene coordinates, which hinders the improvement of localization accuracy. These methods generally use structure-from-motion (SfM) or depth sensors to generate proxy scene coordinate supervision labels for training, but these proxy labels are contaminated with errors and noises, which are sub-optimal for training. To resolve this issue, we introduce a simple yet effective triangulation constraint, which could be easily incorporated into any scene coordinate regression-based framework. Instead of directly regressing the scene coordinates, our constraint reinforces the network, which learns to triangulate the ground truth scene coordinates without any proxy scene coordinate labels for supervision. Extensive experiments across multiple public datasets show that our triangulation constraint establishes significant improvement and even achieves better results without proxy labels for supervision. Furthermore, our method could recover denser and more complete 3D models compared with the SfM and other localization methods.

AB - Visual localization plays a critical role in robotics. The scene coordinate regression-based localization methods have achieved state-of-the-art performance. However, current methods still have a gap between the regressed scene coordinates and the ground truth scene coordinates, which hinders the improvement of localization accuracy. These methods generally use structure-from-motion (SfM) or depth sensors to generate proxy scene coordinate supervision labels for training, but these proxy labels are contaminated with errors and noises, which are sub-optimal for training. To resolve this issue, we introduce a simple yet effective triangulation constraint, which could be easily incorporated into any scene coordinate regression-based framework. Instead of directly regressing the scene coordinates, our constraint reinforces the network, which learns to triangulate the ground truth scene coordinates without any proxy scene coordinate labels for supervision. Extensive experiments across multiple public datasets show that our triangulation constraint establishes significant improvement and even achieves better results without proxy labels for supervision. Furthermore, our method could recover denser and more complete 3D models compared with the SfM and other localization methods.

KW - Mapping

KW - simultaneous localization and mapping (SLAM)

KW - structure-from-motion (SfM)

KW - visual localization

UR - http://www.scopus.com/inward/record.url?scp=85184804441&partnerID=8YFLogxK

U2 - 10.1109/LRA.2024.3362637

DO - 10.1109/LRA.2024.3362637

M3 - 文章

AN - SCOPUS:85184804441

SN - 2377-3766

VL - 9

SP - 3339

EP - 3346

JO - IEEE Robotics and Automation Letters

JF - IEEE Robotics and Automation Letters

IS - 4

ER -

Learn to Triangulate Scene Coordinates for Visual Localization

摘要

访问文件

其它文件与链接

指纹

引用此