TY - JOUR
T1 - Efficient Plug-and-Play Mamba-based Selective Target State Modeling for Lightweight Visual Tracking
AU - Chen, Yao
AU - Jia, Guancheng
AU - Zha, Yufei
AU - Zhang, Peng
N1 - Publisher Copyright:
© 1999-2012 IEEE.
PY - 2026
Y1 - 2026
N2 - Temporal target state modeling is essential for robust visual tracking, yet existing approaches often rely on deep or iterative architectures that incur excessive computational and parametric costs, making them unsuitable for lightweight real-time deployment. To address this challenge, we propose TSTrack, the first lightweight tracking framework that integrates a State Space Model (SSM). TSTrack consists of two innovative modules: (1) Target-Aware Mamba (TAM): It enables low-latency adaptive temporal modeling by fusing real-time search states, static templates, and compressed historical context in Mamba's hidden states through selective bidirectional interaction. As a plug-and-play module, TAM improves the performance of existing lightweight trackers with minor computational and parameter increase (approximately +6% parameters and +1% MACs); (2) Spatial-Channel Aggregation Module (SCAM): It hierarchically refines target features via dual-path attention that coordinates spatial activation enhancement and channel-wise feature recalibration, boosting localization precision in complex scenarios. Our TSTrack achieves promising tracking speed with competitive tracking performance. For instance, it suppresses previous lightweight tracking methods across six commonly used tracking benchmarks, such as LaSOT, GOT-10k, and TrackingNet. Simultaneously, it can run at 41fps on CPU devices. This work redefines efficiency-accuracy trade-offs in lightweight visual tracking and advances temporal modeling theory in resource-constrained computer vision tasks.
AB - Temporal target state modeling is essential for robust visual tracking, yet existing approaches often rely on deep or iterative architectures that incur excessive computational and parametric costs, making them unsuitable for lightweight real-time deployment. To address this challenge, we propose TSTrack, the first lightweight tracking framework that integrates a State Space Model (SSM). TSTrack consists of two innovative modules: (1) Target-Aware Mamba (TAM): It enables low-latency adaptive temporal modeling by fusing real-time search states, static templates, and compressed historical context in Mamba's hidden states through selective bidirectional interaction. As a plug-and-play module, TAM improves the performance of existing lightweight trackers with minor computational and parameter increase (approximately +6% parameters and +1% MACs); (2) Spatial-Channel Aggregation Module (SCAM): It hierarchically refines target features via dual-path attention that coordinates spatial activation enhancement and channel-wise feature recalibration, boosting localization precision in complex scenarios. Our TSTrack achieves promising tracking speed with competitive tracking performance. For instance, it suppresses previous lightweight tracking methods across six commonly used tracking benchmarks, such as LaSOT, GOT-10k, and TrackingNet. Simultaneously, it can run at 41fps on CPU devices. This work redefines efficiency-accuracy trade-offs in lightweight visual tracking and advances temporal modeling theory in resource-constrained computer vision tasks.
KW - Plug-and-play paradigm
KW - State space model
KW - Target state modeling
KW - Visual object tracking
UR - https://www.scopus.com/pages/publications/105035202410
U2 - 10.1109/TMM.2026.3678806
DO - 10.1109/TMM.2026.3678806
M3 - 文章
AN - SCOPUS:105035202410
SN - 1520-9210
JO - IEEE Transactions on Multimedia
JF - IEEE Transactions on Multimedia
ER -