TY - GEN
T1 - Deep object tracking with multi-modal data
AU - Zhang, Xuezhi
AU - Yuan, Yuan
AU - Lu, Xiaoqiang
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/8/16
Y1 - 2016/8/16
N2 - Object tracking is a challenging topic in the field of computer vision since its performance is easily disturbed by occlusion, illumination change, background clutter, scale variation, etc. In this paper, we introduce a robust tracking algorithm that fuses information from both visible images and infrared (IR) images. The proposed tracking algorithm not only incorporates convolutional feature maps from the visible channel, but also employs a scale pyramid representation from IR channel. We estimate the target location by fusing multilayer convolutional feature maps, and predict the target scale from a scale pyramid. The pipeline of the proposed method is as follows. First, the hierarchical convolutional feature maps are obtained from visible images using VGG-Nets. Then, the accurate target location is predicted by the maximum response of correlation filters with the visible image feature maps. Finally, we obtain the precise object scale with a scale pyramid from infrared images where the difference between the target and the background is clear. In order to verify the performance of the proposed method, we capture six video sequences under different conditions. These sequences contain both visible channel and IR channel. Ten state-of-the-art tracking algorithms are compared with our method, and the experimental results show the effectiveness of the proposed tracker.
AB - Object tracking is a challenging topic in the field of computer vision since its performance is easily disturbed by occlusion, illumination change, background clutter, scale variation, etc. In this paper, we introduce a robust tracking algorithm that fuses information from both visible images and infrared (IR) images. The proposed tracking algorithm not only incorporates convolutional feature maps from the visible channel, but also employs a scale pyramid representation from IR channel. We estimate the target location by fusing multilayer convolutional feature maps, and predict the target scale from a scale pyramid. The pipeline of the proposed method is as follows. First, the hierarchical convolutional feature maps are obtained from visible images using VGG-Nets. Then, the accurate target location is predicted by the maximum response of correlation filters with the visible image feature maps. Finally, we obtain the precise object scale with a scale pyramid from infrared images where the difference between the target and the background is clear. In order to verify the performance of the proposed method, we capture six video sequences under different conditions. These sequences contain both visible channel and IR channel. Ten state-of-the-art tracking algorithms are compared with our method, and the experimental results show the effectiveness of the proposed tracker.
UR - http://www.scopus.com/inward/record.url?scp=84987673757&partnerID=8YFLogxK
U2 - 10.1109/CITS.2016.7546403
DO - 10.1109/CITS.2016.7546403
M3 - 会议稿件
AN - SCOPUS:84987673757
T3 - IEEE CITS 2016 - 2016 International Conference on Computer, Information and Telecommunication Systems
BT - IEEE CITS 2016 - 2016 International Conference on Computer, Information and Telecommunication Systems
A2 - Gao, Fei
A2 - Li, Zan
A2 - Caballero, Daniel Cascado
A2 - Fan, Jing
A2 - Obaidat, Mohammad S.
A2 - Nicoploitidis, Petros
A2 - Hsiao, Kuei Fang
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2016 International Conference on Computer, Information and Telecommunication Systems, CITS 2016
Y2 - 6 July 2016 through 8 July 2016
ER -