Transformer Based Visual Inertial Odometry

Sicheng Fei; Jingfeng Li; Lei Li; Jie Liang; Jinwen Hu; Dingwen Zhang; Junwei Han

doi:10.1007/978-981-96-2264-1_54

Transformer Based Visual Inertial Odometry

Sicheng Fei, Jingfeng Li, Lei Li, Jie Liang, Jinwen Hu, Dingwen Zhang, Junwei Han

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

Visual inertial odometry (VIO) is a sensor fusion technology used for positioning and navigation. It combines visual sensor and inertial sensor information to estimate the movement and location of the UAV in real time. In recent years deep learning based approaches VIO have shown outstanding performance than traditional geometric methods. However, VIO tasks usually need to capture long-distance feature dependencies to ensure the continuity and consistency of camera motion trajectories in time series. In this study, we introduce a new end to end transformer based VIO framework, named VIO-former, to enable the model to better understand motion features in video sequences. Comprehensive quantitative and qualitative evaluation is conducted on KITTI datasets to test our method. The experimental results shows that our approach can achieve superior performance when compared with the existing methods.

Original language	English
Title of host publication	Advances in Guidance, Navigation and Control - Proceedings of 2024 International Conference on Guidance, Navigation and Control Volume 17
Editors	Liang Yan, Haibin Duan, Yimin Deng
Publisher	Springer Science and Business Media Deutschland GmbH
Pages	567-575
Number of pages	9
ISBN (Print)	9789819622634
DOIs	https://doi.org/10.1007/978-981-96-2264-1_54
State	Published - 2025
Event	International Conference on Guidance, Navigation and Control, ICGNC 2024 - Changsha, China Duration: 9 Aug 2024 → 11 Aug 2024

Publication series

Name	Lecture Notes in Electrical Engineering
Volume	1353 LNEE
ISSN (Print)	1876-1100
ISSN (Electronic)	1876-1119

Conference

Conference	International Conference on Guidance, Navigation and Control, ICGNC 2024
Country/Territory	China
City	Changsha
Period	9/08/24 → 11/08/24

Keywords

Sensor fusion
Transformer
Visual inertial odometry

Access to Document

10.1007/978-981-96-2264-1_54

Cite this

Fei, S., Li, J., Li, L., Liang, J., Hu, J., Zhang, D., & Han, J. (2025). Transformer Based Visual Inertial Odometry. In L. Yan, H. Duan, & Y. Deng (Eds.), Advances in Guidance, Navigation and Control - Proceedings of 2024 International Conference on Guidance, Navigation and Control Volume 17 (pp. 567-575). (Lecture Notes in Electrical Engineering; Vol. 1353 LNEE). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-96-2264-1_54

Fei, Sicheng ; Li, Jingfeng ; Li, Lei et al. / Transformer Based Visual Inertial Odometry. Advances in Guidance, Navigation and Control - Proceedings of 2024 International Conference on Guidance, Navigation and Control Volume 17. editor / Liang Yan ; Haibin Duan ; Yimin Deng. Springer Science and Business Media Deutschland GmbH, 2025. pp. 567-575 (Lecture Notes in Electrical Engineering).

@inproceedings{478fba4caf394009add9aed950535da8,

title = "Transformer Based Visual Inertial Odometry",

abstract = "Visual inertial odometry (VIO) is a sensor fusion technology used for positioning and navigation. It combines visual sensor and inertial sensor information to estimate the movement and location of the UAV in real time. In recent years deep learning based approaches VIO have shown outstanding performance than traditional geometric methods. However, VIO tasks usually need to capture long-distance feature dependencies to ensure the continuity and consistency of camera motion trajectories in time series. In this study, we introduce a new end to end transformer based VIO framework, named VIO-former, to enable the model to better understand motion features in video sequences. Comprehensive quantitative and qualitative evaluation is conducted on KITTI datasets to test our method. The experimental results shows that our approach can achieve superior performance when compared with the existing methods.",

keywords = "Sensor fusion, Transformer, Visual inertial odometry",

author = "Sicheng Fei and Jingfeng Li and Lei Li and Jie Liang and Jinwen Hu and Dingwen Zhang and Junwei Han",

note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.; International Conference on Guidance, Navigation and Control, ICGNC 2024 ; Conference date: 09-08-2024 Through 11-08-2024",

year = "2025",

doi = "10.1007/978-981-96-2264-1_54",

language = "英语",

isbn = "9789819622634",

series = "Lecture Notes in Electrical Engineering",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "567--575",

editor = "Liang Yan and Haibin Duan and Yimin Deng",

booktitle = "Advances in Guidance, Navigation and Control - Proceedings of 2024 International Conference on Guidance, Navigation and Control Volume 17",

}

Fei, S, Li, J, Li, L, Liang, J, Hu, J , Zhang, D & Han, J 2025, Transformer Based Visual Inertial Odometry. in L Yan, H Duan & Y Deng (eds), Advances in Guidance, Navigation and Control - Proceedings of 2024 International Conference on Guidance, Navigation and Control Volume 17. Lecture Notes in Electrical Engineering, vol. 1353 LNEE, Springer Science and Business Media Deutschland GmbH, pp. 567-575, International Conference on Guidance, Navigation and Control, ICGNC 2024, Changsha, China, 9/08/24. https://doi.org/10.1007/978-981-96-2264-1_54

Transformer Based Visual Inertial Odometry. / Fei, Sicheng; Li, Jingfeng; Li, Lei et al.
Advances in Guidance, Navigation and Control - Proceedings of 2024 International Conference on Guidance, Navigation and Control Volume 17. ed. / Liang Yan; Haibin Duan; Yimin Deng. Springer Science and Business Media Deutschland GmbH, 2025. p. 567-575 (Lecture Notes in Electrical Engineering; Vol. 1353 LNEE).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Transformer Based Visual Inertial Odometry

AU - Fei, Sicheng

AU - Li, Jingfeng

AU - Li, Lei

AU - Liang, Jie

AU - Hu, Jinwen

AU - Zhang, Dingwen

AU - Han, Junwei

N1 - Publisher Copyright: © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.

PY - 2025

Y1 - 2025

N2 - Visual inertial odometry (VIO) is a sensor fusion technology used for positioning and navigation. It combines visual sensor and inertial sensor information to estimate the movement and location of the UAV in real time. In recent years deep learning based approaches VIO have shown outstanding performance than traditional geometric methods. However, VIO tasks usually need to capture long-distance feature dependencies to ensure the continuity and consistency of camera motion trajectories in time series. In this study, we introduce a new end to end transformer based VIO framework, named VIO-former, to enable the model to better understand motion features in video sequences. Comprehensive quantitative and qualitative evaluation is conducted on KITTI datasets to test our method. The experimental results shows that our approach can achieve superior performance when compared with the existing methods.

AB - Visual inertial odometry (VIO) is a sensor fusion technology used for positioning and navigation. It combines visual sensor and inertial sensor information to estimate the movement and location of the UAV in real time. In recent years deep learning based approaches VIO have shown outstanding performance than traditional geometric methods. However, VIO tasks usually need to capture long-distance feature dependencies to ensure the continuity and consistency of camera motion trajectories in time series. In this study, we introduce a new end to end transformer based VIO framework, named VIO-former, to enable the model to better understand motion features in video sequences. Comprehensive quantitative and qualitative evaluation is conducted on KITTI datasets to test our method. The experimental results shows that our approach can achieve superior performance when compared with the existing methods.

KW - Sensor fusion

KW - Transformer

KW - Visual inertial odometry

UR - http://www.scopus.com/inward/record.url?scp=105000700532&partnerID=8YFLogxK

U2 - 10.1007/978-981-96-2264-1_54

DO - 10.1007/978-981-96-2264-1_54

M3 - 会议稿件

AN - SCOPUS:105000700532

SN - 9789819622634

T3 - Lecture Notes in Electrical Engineering

SP - 567

EP - 575

BT - Advances in Guidance, Navigation and Control - Proceedings of 2024 International Conference on Guidance, Navigation and Control Volume 17

A2 - Yan, Liang

A2 - Duan, Haibin

A2 - Deng, Yimin

PB - Springer Science and Business Media Deutschland GmbH

T2 - International Conference on Guidance, Navigation and Control, ICGNC 2024

Y2 - 9 August 2024 through 11 August 2024

ER -

Fei S, Li J, Li L, Liang J, Hu J , Zhang D et al. Transformer Based Visual Inertial Odometry. In Yan L, Duan H, Deng Y, editors, Advances in Guidance, Navigation and Control - Proceedings of 2024 International Conference on Guidance, Navigation and Control Volume 17. Springer Science and Business Media Deutschland GmbH. 2025. p. 567-575. (Lecture Notes in Electrical Engineering). doi: 10.1007/978-981-96-2264-1_54

Transformer Based Visual Inertial Odometry

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this