TY - JOUR
T1 - HTACPE
T2 - A Hybrid Transformer With Adaptive Content and Position Embedding for Sample Learning Efficiency of Hyperspectral Tracker
AU - Wang, Ye
AU - Mei, Shaohui
AU - Ma, Mingyang
AU - Liu, Yuheng
AU - Su, Yuru
N1 - Publisher Copyright:
© 1999-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - Transformer architecture has demonstrated significant potential in hyperspectral object tracking by leveraging global correlation learning to accurately represent the data distribution. However, existing hyperspectral object trackers based on transformer models typically rely on costly pre-trained models, making them prone to crashing due to overfitting when tuned on small-scale hyperspectral videos, greatly limiting their performance. To address this challenge, in this paper, a Hybrid Transformer with Adaptive Content and Position Embedding (HTACPE) tracker is proposed to improve the learning efficiency of the tracking model, and fully explore the spectral-spatial information. Specifically, an Adaptive Content and Position Embedding Module (ACPEM) is designed to dynamically learn the balance between focusing on positional and content-based information, which allows the model to effectively handle datasets of various sizes. To enhance the spectral-spatial information, a Spectral Grouping Module (SGM) is designed to learn the high-frequency information in complex scenarios, thereby enhancing diversified features. It operates in parallel with the ACPEM feature learning module. Furthermore, a Dynamic Reliability Refinement Module (DRRM) is incorporated to address challenges related to accurate object position perception, iteratively refining prediction parameters to enhance the reliability of the model. Extensive experiments demonstrate that the proposed HTACPE achieves satisfactory tracking performance both qualitatively and quantitatively, especially with insufficient training data.
AB - Transformer architecture has demonstrated significant potential in hyperspectral object tracking by leveraging global correlation learning to accurately represent the data distribution. However, existing hyperspectral object trackers based on transformer models typically rely on costly pre-trained models, making them prone to crashing due to overfitting when tuned on small-scale hyperspectral videos, greatly limiting their performance. To address this challenge, in this paper, a Hybrid Transformer with Adaptive Content and Position Embedding (HTACPE) tracker is proposed to improve the learning efficiency of the tracking model, and fully explore the spectral-spatial information. Specifically, an Adaptive Content and Position Embedding Module (ACPEM) is designed to dynamically learn the balance between focusing on positional and content-based information, which allows the model to effectively handle datasets of various sizes. To enhance the spectral-spatial information, a Spectral Grouping Module (SGM) is designed to learn the high-frequency information in complex scenarios, thereby enhancing diversified features. It operates in parallel with the ACPEM feature learning module. Furthermore, a Dynamic Reliability Refinement Module (DRRM) is incorporated to address challenges related to accurate object position perception, iteratively refining prediction parameters to enhance the reliability of the model. Extensive experiments demonstrate that the proposed HTACPE achieves satisfactory tracking performance both qualitatively and quantitatively, especially with insufficient training data.
KW - Adaptive content and position embedding
KW - diversified feature
KW - hybrid transformer
KW - hyperspectral object tracking
UR - http://www.scopus.com/inward/record.url?scp=85214308235&partnerID=8YFLogxK
U2 - 10.1109/TMM.2024.3521819
DO - 10.1109/TMM.2024.3521819
M3 - 文章
AN - SCOPUS:85214308235
SN - 1520-9210
VL - 27
SP - 2384
EP - 2398
JO - IEEE Transactions on Multimedia
JF - IEEE Transactions on Multimedia
ER -