TY - GEN
T1 - A method to build multi-scene datasets for CNN for camera pose regression
AU - Ma, Yuhao
AU - Guo, Hao
AU - Chen, Hong
AU - Tian, Mengxiao
AU - Huo, Xin
AU - Long, Chengjiang
AU - Tang, Shiye
AU - Song, Xiaoyu
AU - Wang, Qing
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/7/2
Y1 - 2018/7/2
N2 - Convolutional neural networks (CNN) have shown to be useful for camera pose regression, and They have robust effects against some challenging scenarios such as lighting changes, motion blur, and scenes with lots of textureless surfaces. Additionally, PoseNet shows that the deep learning system can interpolate the camera pose in space between training images. In this paper, we explore how different strategies for processing datasets will affect the pose regression and propose a method for building multi-scene datasets for training such neural networks. We demonstrate that the location of several scenes can be remembered using only one neural network. By combining multiple scenes, we found that the position errors of the neural network do not decrease significantly as the distance between the cameras increases, which means that we do not need to train several models for the increase number of scenes. We also explore the impact factors that influence the accuracy of models for multi-scene camera pose regression, which can help us merge several scenes into one dataset in a better way. We opened our code and datasets to the public for better researches.
AB - Convolutional neural networks (CNN) have shown to be useful for camera pose regression, and They have robust effects against some challenging scenarios such as lighting changes, motion blur, and scenes with lots of textureless surfaces. Additionally, PoseNet shows that the deep learning system can interpolate the camera pose in space between training images. In this paper, we explore how different strategies for processing datasets will affect the pose regression and propose a method for building multi-scene datasets for training such neural networks. We demonstrate that the location of several scenes can be remembered using only one neural network. By combining multiple scenes, we found that the position errors of the neural network do not decrease significantly as the distance between the cameras increases, which means that we do not need to train several models for the increase number of scenes. We also explore the impact factors that influence the accuracy of models for multi-scene camera pose regression, which can help us merge several scenes into one dataset in a better way. We opened our code and datasets to the public for better researches.
KW - Camera pose Estimation
KW - Convolutional neural network
KW - Dataset
KW - Visual localization
UR - http://www.scopus.com/inward/record.url?scp=85062191359&partnerID=8YFLogxK
U2 - 10.1109/AIVR.2018.00022
DO - 10.1109/AIVR.2018.00022
M3 - 会议稿件
AN - SCOPUS:85062191359
T3 - Proceedings - 2018 IEEE International Conference on Artificial Intelligence and Virtual Reality, AIVR 2018
SP - 108
EP - 115
BT - Proceedings - 2018 IEEE International Conference on Artificial Intelligence and Virtual Reality, AIVR 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 1st IEEE International Conference on Artificial Intelligence and Virtual Reality, AIVR 2018
Y2 - 10 December 2018 through 12 December 2018
ER -