TY - JOUR
T1 - Efficient thermal infrared tracking with cross-modal compress distillation
AU - Li, Hangfei
AU - Zha, Yufei
AU - Li, Huanyu
AU - Zhang, Peng
AU - Huang, Wei
N1 - Publisher Copyright:
© 2023 Elsevier Ltd
PY - 2023/8
Y1 - 2023/8
N2 - The key issue of thermal infrared tracking is to use neural networks to represent the target effectively and efficiently in the thermal infrared domain. The lack of thermal infrared trainable datasets makes it difficult to train a robust infrared object tracker from scratch, and the time-consuming convolution operations also make the tracking slow. To address the above problems, we proposed cross-modal compression distillation to represent thermal infrared objects for tracking, by leveraging an off-the-shelf RGB model with knowledge distillation. Specifically, cross-modal distillation is performed to effectively transfer knowledge from RGB modality to thermal infrared modality by inputting paired RGB and thermal infrared images into two branches of a Siamese network. Additionally, based on the teacher–student model architecture, the feature extractor is compressed into a lightweight model by model pruning and multi-level deep feature matching. Experimental results on LSOTB-TIR and PTB-TIR datasets show that the thermal infrared object tracking models distilled by our proposed method achieved faster tracking speed with better performance than the baseline RGB tracker by gaining an improvement of 1.5% Success Rate, 2.2% Precision, and 1.9% Normalized Precision, 58 frames per second (FPS) on LSOTB-TIR dataset, respectively.
AB - The key issue of thermal infrared tracking is to use neural networks to represent the target effectively and efficiently in the thermal infrared domain. The lack of thermal infrared trainable datasets makes it difficult to train a robust infrared object tracker from scratch, and the time-consuming convolution operations also make the tracking slow. To address the above problems, we proposed cross-modal compression distillation to represent thermal infrared objects for tracking, by leveraging an off-the-shelf RGB model with knowledge distillation. Specifically, cross-modal distillation is performed to effectively transfer knowledge from RGB modality to thermal infrared modality by inputting paired RGB and thermal infrared images into two branches of a Siamese network. Additionally, based on the teacher–student model architecture, the feature extractor is compressed into a lightweight model by model pruning and multi-level deep feature matching. Experimental results on LSOTB-TIR and PTB-TIR datasets show that the thermal infrared object tracking models distilled by our proposed method achieved faster tracking speed with better performance than the baseline RGB tracker by gaining an improvement of 1.5% Success Rate, 2.2% Precision, and 1.9% Normalized Precision, 58 frames per second (FPS) on LSOTB-TIR dataset, respectively.
KW - Cross-modal
KW - Knowledge distillation
KW - Thermal infrared tracking
UR - http://www.scopus.com/inward/record.url?scp=85158039556&partnerID=8YFLogxK
U2 - 10.1016/j.engappai.2023.106360
DO - 10.1016/j.engappai.2023.106360
M3 - 文章
AN - SCOPUS:85158039556
SN - 0952-1976
VL - 123
JO - Engineering Applications of Artificial Intelligence
JF - Engineering Applications of Artificial Intelligence
M1 - 106360
ER -