Visual Object Tracking with Multi-Frame Distractor Suppression

Yamin Han, Mingyu Cai, Jie Wu, Zhixuan Bai, Tao Zhuo, Hongming Zhang, Yanning Zhang

科研成果: 期刊稿件文章同行评审

摘要

With the rapid development of CNN or Transformer, the present mainstream approaches regard an image patch as the reference of the target to perform tracking, which is known as template matching-based trackers. However, most existing template matching-based trackers only consider the per-frame localization accuracy, neglecting the potential distractor (similar object) dependencies among multiple video frames, which poses a fundamental challenge in template matching-based tracking. In this work, we propose a novel comprehensive framework with multi-frame distractor suppression for visual object tracking (MFDSTrack), which explicitly models the temporal history of both the target object and potential distractors. Specifically, we utilize a universal target candidate generation module to detect target candidates (both target and distractors), providing a holistic view of the scene. In addition, a temporal and distractor-aware association module is designed to suppress multi-frame distractors by adopting a simple encoder-decoder Transformer architecture. The encoder accepts inputs of target candidates' history, while the decoder takes current target candidate queries and the output of the encoder as inputs to associate current target candidate queries with historical trajectories. We extensively evaluate our trackers, MFDSTrack-SD, MFDSTrack-OS, MFDSTrack-GRM, and MFDSTrack-LT on the LaSOT, LaSOText, TrackingNet, GOT-10k, UAV123, NFS, and OTB100 benchmark. Extensive experiments show that our methods outperform previous state-of-the-art trackers on seven tracking benchmarks.

源语言英语
页(从-至)2556-2569
页数14
期刊IEEE Transactions on Circuits and Systems for Video Technology
35
3
DOI
出版状态已出版 - 2025

指纹

探究 'Visual Object Tracking with Multi-Frame Distractor Suppression' 的科研主题。它们共同构成独一无二的指纹。

引用此