HTACPE: A Hybrid Transformer With Adaptive Content and Position Embedding for Sample Learning Efficiency of Hyperspectral Tracker

Ye Wang, Shaohui Mei, Mingyang Ma, Yuheng Liu, Yuru Su

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Transformer architecture has demonstrated significant potential in hyperspectral object tracking by leveraging global correlation learning to accurately represent the data distribution. However, existing hyperspectral object trackers based on transformer models typically rely on costly pre-trained models, making them prone to crashing due to overfitting when tuned on small-scale hyperspectral videos, greatly limiting their performance. To address this challenge, in this paper, a Hybrid Transformer with Adaptive Content and Position Embedding (HTACPE) tracker is proposed to improve the learning efficiency of the tracking model, and fully explore the spectral-spatial information. Specifically, an Adaptive Content and Position Embedding Module (ACPEM) is designed to dynamically learn the balance between focusing on positional and content-based information, which allows the model to effectively handle datasets of various sizes. To enhance the spectral-spatial information, a Spectral Grouping Module (SGM) is designed to learn the high-frequency information in complex scenarios, thereby enhancing diversified features. It operates in parallel with the ACPEM feature learning module. Furthermore, a Dynamic Reliability Refinement Module (DRRM) is incorporated to address challenges related to accurate object position perception, iteratively refining prediction parameters to enhance the reliability of the model. Extensive experiments demonstrate that the proposed HTACPE achieves satisfactory tracking performance both qualitatively and quantitatively, especially with insufficient training data.

Original languageEnglish
Pages (from-to)2384-2398
Number of pages15
JournalIEEE Transactions on Multimedia
Volume27
DOIs
StatePublished - 2025

Keywords

  • Adaptive content and position embedding
  • diversified feature
  • hybrid transformer
  • hyperspectral object tracking

Fingerprint

Dive into the research topics of 'HTACPE: A Hybrid Transformer With Adaptive Content and Position Embedding for Sample Learning Efficiency of Hyperspectral Tracker'. Together they form a unique fingerprint.

Cite this