TY - GEN
T1 - Deep Adaptive Discriminate Siamese Network with Multi-Level Response for Visual Object Tracking
AU - Wang, Yichen
AU - Mao, Zhaoyong
AU - Wang, Xin
AU - Ren, Jing
AU - Meng, Chenlin
AU - Shen, Junge
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Visual object tracking has been intensively studied for its role in traffic surveillance, human action recognition, and autonomous driving. Siamese network-based methods have demonstrated a satisfactory trade-off between precision and efficiency for visual tracking. Nevertheless, the accuracy of Siamese trackers is limited when it comes to predicting the target's location in scenarios involving background clutter, changes in illumination, variations in scale, deformation, fast motion, among others. We suggest a novel approach in our manuscript, which involves a deep adaptive discriminative Siamese network equipped with an advanced fusion scheme for multiple level responses. To enhance the feature discriminability of the Siamese network, we introduce a novel residual channel attention clipping unit. This unit seamlessly integrates residual connections and channel attention, leading to significant optimization and improved representation in the network. Then, we introduce a multi-response adaptive fusion structure that takes the advantages of the low-level, mediate-level, and high-level features, yielding a comprehensive score map that reveals multiple levels of semantics. Our experiments demonstrate that our tracker performs exceptionally well compared to current leading trackers on widely-used public tracking datasets such as OTB-2015 and GOT10k. The method attains an AUC score of 0.655 on OTB2015, while maintaining a processing speed of 63 FPS.
AB - Visual object tracking has been intensively studied for its role in traffic surveillance, human action recognition, and autonomous driving. Siamese network-based methods have demonstrated a satisfactory trade-off between precision and efficiency for visual tracking. Nevertheless, the accuracy of Siamese trackers is limited when it comes to predicting the target's location in scenarios involving background clutter, changes in illumination, variations in scale, deformation, fast motion, among others. We suggest a novel approach in our manuscript, which involves a deep adaptive discriminative Siamese network equipped with an advanced fusion scheme for multiple level responses. To enhance the feature discriminability of the Siamese network, we introduce a novel residual channel attention clipping unit. This unit seamlessly integrates residual connections and channel attention, leading to significant optimization and improved representation in the network. Then, we introduce a multi-response adaptive fusion structure that takes the advantages of the low-level, mediate-level, and high-level features, yielding a comprehensive score map that reveals multiple levels of semantics. Our experiments demonstrate that our tracker performs exceptionally well compared to current leading trackers on widely-used public tracking datasets such as OTB-2015 and GOT10k. The method attains an AUC score of 0.655 on OTB2015, while maintaining a processing speed of 63 FPS.
KW - Channel attention
KW - Multi-response fusion structure
KW - Siamese network
KW - Visual tracking
UR - https://www.scopus.com/pages/publications/85172118909
U2 - 10.1109/ICFEICT59519.2023.00042
DO - 10.1109/ICFEICT59519.2023.00042
M3 - 会议稿件
AN - SCOPUS:85172118909
T3 - Proceedings - 2023 3rd International Conference on Frontiers of Electronics, Information and Computation Technologies, ICFEICT 2023
SP - 197
EP - 203
BT - Proceedings - 2023 3rd International Conference on Frontiers of Electronics, Information and Computation Technologies, ICFEICT 2023
A2 - Liu, Weijian
A2 - Wang, Zhuo Zheng
A2 - You, Peng
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 3rd International Conference on Frontiers of Electronics, Information and Computation Technologies, ICFEICT 2023
Y2 - 26 May 2023 through 29 May 2023
ER -