Multi-scale feature extraction and fusion with attention interaction for RGB-T tracking

Haijiao Xing, Wei Wei, Lei Zhang, Yanning Zhang

科研成果: 期刊稿件文章同行评审

1 引用 (Scopus)

摘要

RGB-T single-object tracking aims to track objects utilizing both RGB images and thermal infrared(TIR) images. Though the siamese-based RGB-T tracker shows its advantage in tracking speed, its accuracy still cannot be compared with other state-of-the-art trackers (e.g., MDNet). In this study, we revisit the existing siamese-based RGB-T tracker and find that such fall behind comes from insufficient feature fusion between RGB image and TIR image, as well as incomplete interactions between template frame and search frame. Inspired by this, we propose a multi-scale feature extraction and fusion network with Temporal-Spatial Memory (MFATrack). Instead of fusing RGB image and TIR image with the single-scale feature map or only high-level features from the multi-scale feature map, MFATrack proposes a new fusion strategy by fusing features from all scales, which can capture contextual information in shallow layers and details in the deep layer. To learn the feature better for tracking tasks, MFATrack fuses the features via several consecutive frames. In addition, we also propose a self-attention interaction module specifically designed for the search frame, highlighting the features in the search frame that are relevant to the target and thus facilitating rapid convergence for target localization. Experimental results demonstrate the proposed MFATrack is not only fast, but also can obtain better tracking accuracy compared with other competing methods including MDNet-based methods and other siamese-based trackers.

源语言英语
文章编号110917
期刊Pattern Recognition
157
DOI
出版状态已出版 - 1月 2025

指纹

探究 'Multi-scale feature extraction and fusion with attention interaction for RGB-T tracking' 的科研主题。它们共同构成独一无二的指纹。

引用此