跳到主要导航 跳到搜索 跳到主要内容

Efficient Plug-and-Play Mamba-based Selective Target State Modeling for Lightweight Visual Tracking

  • Northwestern Polytechnical University Xian

科研成果: 期刊稿件文章同行评审

摘要

Temporal target state modeling is essential for robust visual tracking, yet existing approaches often rely on deep or iterative architectures that incur excessive computational and parametric costs, making them unsuitable for lightweight real-time deployment. To address this challenge, we propose TSTrack, the first lightweight tracking framework that integrates a State Space Model (SSM). TSTrack consists of two innovative modules: (1) Target-Aware Mamba (TAM): It enables low-latency adaptive temporal modeling by fusing real-time search states, static templates, and compressed historical context in Mamba's hidden states through selective bidirectional interaction. As a plug-and-play module, TAM improves the performance of existing lightweight trackers with minor computational and parameter increase (approximately +6% parameters and +1% MACs); (2) Spatial-Channel Aggregation Module (SCAM): It hierarchically refines target features via dual-path attention that coordinates spatial activation enhancement and channel-wise feature recalibration, boosting localization precision in complex scenarios. Our TSTrack achieves promising tracking speed with competitive tracking performance. For instance, it suppresses previous lightweight tracking methods across six commonly used tracking benchmarks, such as LaSOT, GOT-10k, and TrackingNet. Simultaneously, it can run at 41fps on CPU devices. This work redefines efficiency-accuracy trade-offs in lightweight visual tracking and advances temporal modeling theory in resource-constrained computer vision tasks.

源语言英语
期刊IEEE Transactions on Multimedia
DOI
出版状态已接受/待刊 - 2026

指纹

探究 'Efficient Plug-and-Play Mamba-based Selective Target State Modeling for Lightweight Visual Tracking' 的科研主题。它们共同构成独一无二的指纹。

引用此