TY - GEN
T1 - MRM-RETrack
T2 - 8th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2025
AU - He, Yuting
AU - Fan, Bin
AU - Wan, Zhexiong
AU - Zhang, Zhiyuan
AU - Dai, Yuchao
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2026.
PY - 2026
Y1 - 2026
N2 - In recent years, RGB-event object tracking has achieved significant progress, demonstrating its increasingly enhanced perception and tracking capabilities in dynamic scenes. However, existing methods are predominantly based on CNN or Transformer architectures, which typically suffer from high computational complexity and memory overhead. The emerging Mamba architecture, while preserving the ability to model long-range dependencies, significantly reduces memory consumption, opening new avenues for the design of efficient tracking models. Nevertheless, current Mamba-based RGB-event tracking methods still face challenges such as insufficient feature learning and lack of cross-modal alignment, thereby impacting tracking accuracy and overall robustness. This paper proposes a novel RGB-event tracking framework, aiming to achieve high-performance, low-memory cross-modal object tracking. Specifically, we introduce a hierarchical local-global feature extraction strategy, integrating a Multi-Scale Residual Module (MSRM) and a Gated Mamba Module (GMM), to collaboratively enhance both fine-grained local feature extraction and long-range dependency capture. Furthermore, we develop an efficient Aligned Difference-Enhanced Mamba module (ADE-Mamba), which explicitly aligns complementary contextual features by focusing on inter-modal discrepancies. To further boost tracking performance, we design an adaptive dual-modal tracking head that dynamically adjusts and fuses the contributions from the RGB and event modalities, enabling precise target localization. Extensive experiments on multiple benchmark datasets demonstrate that our method exhibits superior performance in both short-term and long-term tracking tasks.
AB - In recent years, RGB-event object tracking has achieved significant progress, demonstrating its increasingly enhanced perception and tracking capabilities in dynamic scenes. However, existing methods are predominantly based on CNN or Transformer architectures, which typically suffer from high computational complexity and memory overhead. The emerging Mamba architecture, while preserving the ability to model long-range dependencies, significantly reduces memory consumption, opening new avenues for the design of efficient tracking models. Nevertheless, current Mamba-based RGB-event tracking methods still face challenges such as insufficient feature learning and lack of cross-modal alignment, thereby impacting tracking accuracy and overall robustness. This paper proposes a novel RGB-event tracking framework, aiming to achieve high-performance, low-memory cross-modal object tracking. Specifically, we introduce a hierarchical local-global feature extraction strategy, integrating a Multi-Scale Residual Module (MSRM) and a Gated Mamba Module (GMM), to collaboratively enhance both fine-grained local feature extraction and long-range dependency capture. Furthermore, we develop an efficient Aligned Difference-Enhanced Mamba module (ADE-Mamba), which explicitly aligns complementary contextual features by focusing on inter-modal discrepancies. To further boost tracking performance, we design an adaptive dual-modal tracking head that dynamically adjusts and fuses the contributions from the RGB and event modalities, enabling precise target localization. Extensive experiments on multiple benchmark datasets demonstrate that our method exhibits superior performance in both short-term and long-term tracking tasks.
KW - Event Camera
KW - Mamba
KW - Multimodal Fusion
KW - Object Tracking
UR - https://www.scopus.com/pages/publications/105028993616
U2 - 10.1007/978-981-95-5764-6_12
DO - 10.1007/978-981-95-5764-6_12
M3 - 会议稿件
AN - SCOPUS:105028993616
SN - 9789819557639
T3 - Lecture Notes in Computer Science
SP - 162
EP - 177
BT - Pattern Recognition and Computer Vision - 8th Chinese Conference, PRCV 2025, Proceedings
A2 - Kittler, Josef
A2 - Xiong, Hongkai
A2 - Lin, Weiyao
A2 - Yang, Jian
A2 - Chen, Xilin
A2 - Lu, Jiwen
A2 - Yu, Jingyi
A2 - Zheng, Weishi
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 15 October 2025 through 18 October 2025
ER -