Unidirectional Cross-Modal Fusion for RGB-T Tracking

Xiao Guo, Hangfei Li, Yufei Zha, Peng Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The key issue of RGB-T tracking is to obtain an effective multimodal representation of targets by utilizing complementary RGB and TIR modality information. Previous methods of template fusion or bidirectional search-template interaction potentially diminish the target representation, resulting from noise information of both templates and search regions. Meanwhile, the direct fusion of sole search features without interacting with templates cannot fully utilize target-relevant contextual information. To mitigate these issues, we present UCTrack, which fuses complementary multimodal search features conditioned on undisturbed RGB and TIR template features. Specifically, we design a Unidirectional Cross-modal Fusion (UCF) module to effectively minimize the influence of background noise on templates by pruning the unnecessary template-to-search cross-modal interaction and to mutually enhance RGB and TIR search features with target-relevant information through multimodal spatial fusion. Furthermore, this module is seamlessly integrated into different layers of a ViT backbone to facilitate feature extraction and cross-modal fusion for RGB-T tracking. Benefiting from the UCF module, UCTrack can effectively and accurately represent multimodal target features without unnecessary template-to-search interaction flow and direct template fusion, making the first proposal of unidirectional cross-modal fusion paradigm for RGB-T tracking to our best knowledge. Extensive experiments on three popular RGB-T tracking benchmarks demonstrate that our method achieves state-of-the-art performance.

Original languageEnglish
Title of host publicationECAI 2024 - 27th European Conference on Artificial Intelligence, Including 13th Conference on Prestigious Applications of Intelligent Systems, PAIS 2024, Proceedings
EditorsUlle Endriss, Francisco S. Melo, Kerstin Bach, Alberto Bugarin-Diz, Jose M. Alonso-Moral, Senen Barro, Fredrik Heintz
PublisherIOS Press BV
Pages490-497
Number of pages8
ISBN (Electronic)9781643685489
DOIs
StatePublished - 16 Oct 2024
Event27th European Conference on Artificial Intelligence, ECAI 2024 - Santiago de Compostela, Spain
Duration: 19 Oct 202424 Oct 2024

Publication series

NameFrontiers in Artificial Intelligence and Applications
Volume392
ISSN (Print)0922-6389
ISSN (Electronic)1879-8314

Conference

Conference27th European Conference on Artificial Intelligence, ECAI 2024
Country/TerritorySpain
CitySantiago de Compostela
Period19/10/2424/10/24

Fingerprint

Dive into the research topics of 'Unidirectional Cross-Modal Fusion for RGB-T Tracking'. Together they form a unique fingerprint.

Cite this