HTACPE: A Hybrid Transformer With Adaptive Content and Position Embedding for Sample Learning Efficiency of Hyperspectral Tracker

Ye Wang; Shaohui Mei; Mingyang Ma; Yuheng Liu; Yuru Su

doi:10.1109/TMM.2024.3521819

HTACPE: A Hybrid Transformer With Adaptive Content and Position Embedding for Sample Learning Efficiency of Hyperspectral Tracker

Ye Wang, Shaohui Mei, Mingyang Ma, Yuheng Liu, Yuru Su

School of Electronics and Information

Northwestern Polytechnical University Xian

Research output: Contribution to journal › Article › peer-review

1 Scopus citations

Abstract

Transformer architecture has demonstrated significant potential in hyperspectral object tracking by leveraging global correlation learning to accurately represent the data distribution. However, existing hyperspectral object trackers based on transformer models typically rely on costly pre-trained models, making them prone to crashing due to overfitting when tuned on small-scale hyperspectral videos, greatly limiting their performance. To address this challenge, in this paper, a Hybrid Transformer with Adaptive Content and Position Embedding (HTACPE) tracker is proposed to improve the learning efficiency of the tracking model, and fully explore the spectral-spatial information. Specifically, an Adaptive Content and Position Embedding Module (ACPEM) is designed to dynamically learn the balance between focusing on positional and content-based information, which allows the model to effectively handle datasets of various sizes. To enhance the spectral-spatial information, a Spectral Grouping Module (SGM) is designed to learn the high-frequency information in complex scenarios, thereby enhancing diversified features. It operates in parallel with the ACPEM feature learning module. Furthermore, a Dynamic Reliability Refinement Module (DRRM) is incorporated to address challenges related to accurate object position perception, iteratively refining prediction parameters to enhance the reliability of the model. Extensive experiments demonstrate that the proposed HTACPE achieves satisfactory tracking performance both qualitatively and quantitatively, especially with insufficient training data.

Original language	English
Pages (from-to)	2384-2398
Number of pages	15
Journal	IEEE Transactions on Multimedia
Volume	27
DOIs	https://doi.org/10.1109/TMM.2024.3521819
State	Published - 2025

Keywords

Adaptive content and position embedding
diversified feature
hybrid transformer
hyperspectral object tracking

Access to Document

10.1109/TMM.2024.3521819

Cite this

@article{110f15ca5ecd4e4282c5d67eb90e1560,

title = "HTACPE: A Hybrid Transformer With Adaptive Content and Position Embedding for Sample Learning Efficiency of Hyperspectral Tracker",

abstract = "Transformer architecture has demonstrated significant potential in hyperspectral object tracking by leveraging global correlation learning to accurately represent the data distribution. However, existing hyperspectral object trackers based on transformer models typically rely on costly pre-trained models, making them prone to crashing due to overfitting when tuned on small-scale hyperspectral videos, greatly limiting their performance. To address this challenge, in this paper, a Hybrid Transformer with Adaptive Content and Position Embedding (HTACPE) tracker is proposed to improve the learning efficiency of the tracking model, and fully explore the spectral-spatial information. Specifically, an Adaptive Content and Position Embedding Module (ACPEM) is designed to dynamically learn the balance between focusing on positional and content-based information, which allows the model to effectively handle datasets of various sizes. To enhance the spectral-spatial information, a Spectral Grouping Module (SGM) is designed to learn the high-frequency information in complex scenarios, thereby enhancing diversified features. It operates in parallel with the ACPEM feature learning module. Furthermore, a Dynamic Reliability Refinement Module (DRRM) is incorporated to address challenges related to accurate object position perception, iteratively refining prediction parameters to enhance the reliability of the model. Extensive experiments demonstrate that the proposed HTACPE achieves satisfactory tracking performance both qualitatively and quantitatively, especially with insufficient training data.",

keywords = "Adaptive content and position embedding, diversified feature, hybrid transformer, hyperspectral object tracking",

author = "Ye Wang and Shaohui Mei and Mingyang Ma and Yuheng Liu and Yuru Su",

note = "Publisher Copyright: {\textcopyright} 1999-2012 IEEE.",

year = "2025",

doi = "10.1109/TMM.2024.3521819",

language = "英语",

volume = "27",

pages = "2384--2398",

journal = "IEEE Transactions on Multimedia",

issn = "1520-9210",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - HTACPE

T2 - A Hybrid Transformer With Adaptive Content and Position Embedding for Sample Learning Efficiency of Hyperspectral Tracker

AU - Wang, Ye

AU - Mei, Shaohui

AU - Ma, Mingyang

AU - Liu, Yuheng

AU - Su, Yuru

PY - 2025

Y1 - 2025

N2 - Transformer architecture has demonstrated significant potential in hyperspectral object tracking by leveraging global correlation learning to accurately represent the data distribution. However, existing hyperspectral object trackers based on transformer models typically rely on costly pre-trained models, making them prone to crashing due to overfitting when tuned on small-scale hyperspectral videos, greatly limiting their performance. To address this challenge, in this paper, a Hybrid Transformer with Adaptive Content and Position Embedding (HTACPE) tracker is proposed to improve the learning efficiency of the tracking model, and fully explore the spectral-spatial information. Specifically, an Adaptive Content and Position Embedding Module (ACPEM) is designed to dynamically learn the balance between focusing on positional and content-based information, which allows the model to effectively handle datasets of various sizes. To enhance the spectral-spatial information, a Spectral Grouping Module (SGM) is designed to learn the high-frequency information in complex scenarios, thereby enhancing diversified features. It operates in parallel with the ACPEM feature learning module. Furthermore, a Dynamic Reliability Refinement Module (DRRM) is incorporated to address challenges related to accurate object position perception, iteratively refining prediction parameters to enhance the reliability of the model. Extensive experiments demonstrate that the proposed HTACPE achieves satisfactory tracking performance both qualitatively and quantitatively, especially with insufficient training data.

AB - Transformer architecture has demonstrated significant potential in hyperspectral object tracking by leveraging global correlation learning to accurately represent the data distribution. However, existing hyperspectral object trackers based on transformer models typically rely on costly pre-trained models, making them prone to crashing due to overfitting when tuned on small-scale hyperspectral videos, greatly limiting their performance. To address this challenge, in this paper, a Hybrid Transformer with Adaptive Content and Position Embedding (HTACPE) tracker is proposed to improve the learning efficiency of the tracking model, and fully explore the spectral-spatial information. Specifically, an Adaptive Content and Position Embedding Module (ACPEM) is designed to dynamically learn the balance between focusing on positional and content-based information, which allows the model to effectively handle datasets of various sizes. To enhance the spectral-spatial information, a Spectral Grouping Module (SGM) is designed to learn the high-frequency information in complex scenarios, thereby enhancing diversified features. It operates in parallel with the ACPEM feature learning module. Furthermore, a Dynamic Reliability Refinement Module (DRRM) is incorporated to address challenges related to accurate object position perception, iteratively refining prediction parameters to enhance the reliability of the model. Extensive experiments demonstrate that the proposed HTACPE achieves satisfactory tracking performance both qualitatively and quantitatively, especially with insufficient training data.

KW - Adaptive content and position embedding

KW - diversified feature

KW - hybrid transformer

KW - hyperspectral object tracking

UR - http://www.scopus.com/inward/record.url?scp=85214308235&partnerID=8YFLogxK

U2 - 10.1109/TMM.2024.3521819

DO - 10.1109/TMM.2024.3521819

M3 - 文章

AN - SCOPUS:85214308235

SN - 1520-9210

VL - 27

SP - 2384

EP - 2398

JO - IEEE Transactions on Multimedia

JF - IEEE Transactions on Multimedia

ER -

HTACPE: A Hybrid Transformer With Adaptive Content and Position Embedding for Sample Learning Efficiency of Hyperspectral Tracker

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this