TY - JOUR
T1 - SiamPAT
T2 - Siamese point attention networks for robust visual tracking
AU - Chen, Hang
AU - Zhang, Weiguo
AU - Yan, Danghui
N1 - Publisher Copyright:
© 2021 SPIE and IS&T.
PY - 2021/9/1
Y1 - 2021/9/1
N2 - Attention mechanism originates from the study of human visual behavior, which has been widely used in various fields of artificial intelligence in recent years and has become an important part of neural network structure. Many attention mechanism-based trackers have gained improved performance in both accuracy and robustness. However, these trackers cannot suppress the influence of background information and distractors accurately and do not enhance the target object information, which limits the performance of these trackers. We propose new Siamese point attention (SPA) networks for robust visual tracking. SPA networks learn position attention and channel attention jointly on two branch information. To construct point attention, each point on the template feature is used to calculate the similarity on the search feature. The similarity calculation is based on the local information of the target object, which can reduce the influence of background, deformation, and rotation factors. We can obtain the region of interest by calculating the position attention from point attention. Position attention is integrated into the calculation of channel attention to reduce the influence of irrelevant areas. In addition, we also propose the object attention, and integrate it into the classification and regression module to further enhance the semantic information of the target object and improve the tracking accuracy. Extensive experiments are also conducted on five benchmark datasets. The experiment results show that our method achieves state-of-The-Art performance.
AB - Attention mechanism originates from the study of human visual behavior, which has been widely used in various fields of artificial intelligence in recent years and has become an important part of neural network structure. Many attention mechanism-based trackers have gained improved performance in both accuracy and robustness. However, these trackers cannot suppress the influence of background information and distractors accurately and do not enhance the target object information, which limits the performance of these trackers. We propose new Siamese point attention (SPA) networks for robust visual tracking. SPA networks learn position attention and channel attention jointly on two branch information. To construct point attention, each point on the template feature is used to calculate the similarity on the search feature. The similarity calculation is based on the local information of the target object, which can reduce the influence of background, deformation, and rotation factors. We can obtain the region of interest by calculating the position attention from point attention. Position attention is integrated into the calculation of channel attention to reduce the influence of irrelevant areas. In addition, we also propose the object attention, and integrate it into the classification and regression module to further enhance the semantic information of the target object and improve the tracking accuracy. Extensive experiments are also conducted on five benchmark datasets. The experiment results show that our method achieves state-of-The-Art performance.
KW - attention mechanism
KW - object attention
KW - Siamese point attention
KW - visual tracking
UR - http://www.scopus.com/inward/record.url?scp=85118787188&partnerID=8YFLogxK
U2 - 10.1117/1.JEI.30.5.053001
DO - 10.1117/1.JEI.30.5.053001
M3 - 文章
AN - SCOPUS:85118787188
SN - 1017-9909
VL - 30
JO - Journal of Electronic Imaging
JF - Journal of Electronic Imaging
IS - 5
M1 - 053001
ER -