TY - JOUR
T1 - Target-Aware Transformer for Satellite Video Object Tracking
AU - Lai, Pujian
AU - Zhang, Meili
AU - Cheng, Gong
AU - Li, Shengyang
AU - Huang, Xiankai
AU - Han, Junwei
N1 - Publisher Copyright:
© 1980-2012 IEEE.
PY - 2024
Y1 - 2024
N2 - Recent years have witnessed the astonishing development of transformer-based paradigm in single object tracking (SOT) in generic videos. However, due to the fact that the targets of interest in satellite videos are small in size and weak in visual appearance, the advancements of transformer-based paradigm in satellite video object tracking are impeded. To alleviate this issue, a novel transformer-based recipe is proposed, which consists of a bi-direction propagation and fusion (Bi-PF) strategy and a target-aware enhancement (TAE) module. Concretely, we first adopt the Bi-PF strategy to make full use of multiscale information to generate discriminative representations of tracking targets. Then, the TAE module is employed to decouple an object query into content-aware embedding and spatial-aware embedding and produce a target prototype to help get high-quality content-aware embedding. It is worth mentioning that, different from the previous methods in satellite video tracking most of which evaluate their performance using only several videos, we conduct extensive experiments on the SatSOT dataset which consists of 105 videos. In particular, the proposed method achieves the success score of 45.6% and the precision score of 57.6%, surpassing the baseline method by 5.0% and 9.5%, respectively. The code will be released at https://github.com/laybebe/TATrans_SVOT .
AB - Recent years have witnessed the astonishing development of transformer-based paradigm in single object tracking (SOT) in generic videos. However, due to the fact that the targets of interest in satellite videos are small in size and weak in visual appearance, the advancements of transformer-based paradigm in satellite video object tracking are impeded. To alleviate this issue, a novel transformer-based recipe is proposed, which consists of a bi-direction propagation and fusion (Bi-PF) strategy and a target-aware enhancement (TAE) module. Concretely, we first adopt the Bi-PF strategy to make full use of multiscale information to generate discriminative representations of tracking targets. Then, the TAE module is employed to decouple an object query into content-aware embedding and spatial-aware embedding and produce a target prototype to help get high-quality content-aware embedding. It is worth mentioning that, different from the previous methods in satellite video tracking most of which evaluate their performance using only several videos, we conduct extensive experiments on the SatSOT dataset which consists of 105 videos. In particular, the proposed method achieves the success score of 45.6% and the precision score of 57.6%, surpassing the baseline method by 5.0% and 9.5%, respectively. The code will be released at https://github.com/laybebe/TATrans_SVOT .
KW - Bi-direction propagation and fusion (Bi-PF)
KW - satellite video object tracking
KW - target-aware enhancement (TAE)
UR - http://www.scopus.com/inward/record.url?scp=85179813396&partnerID=8YFLogxK
U2 - 10.1109/TGRS.2023.3339658
DO - 10.1109/TGRS.2023.3339658
M3 - 文章
AN - SCOPUS:85179813396
SN - 0196-2892
VL - 62
SP - 1
EP - 10
JO - IEEE Transactions on Geoscience and Remote Sensing
JF - IEEE Transactions on Geoscience and Remote Sensing
M1 - 5601410
ER -