Visual Object Tracking with Multi-Frame Distractor Suppression

Yamin Han, Mingyu Cai, Jie Wu, Zhixuan Bai, Tao Zhuo, Hongming Zhang, Yanning Zhang

Research output: Contribution to journalArticlepeer-review

Abstract

With the rapid development of CNN or Transformer, the present mainstream approaches regard an image patch as the reference of the target to perform tracking, which is known as template matching-based trackers. However, most existing template matching-based trackers only consider the per-frame localization accuracy, neglecting the potential distractor (similar object) dependencies among multiple video frames, which poses a fundamental challenge in template matching-based tracking. In this work, we propose a novel comprehensive framework with multi-frame distractor suppression for visual object tracking (MFDSTrack), which explicitly models the temporal history of both the target object and potential distractors. Specifically, we utilize a universal target candidate generation module to detect target candidates (both target and distractors), providing a holistic view of the scene. In addition, a temporal and distractor-aware association module is designed to suppress multi-frame distractors by adopting a simple encoder-decoder Transformer architecture. The encoder accepts inputs of target candidates' history, while the decoder takes current target candidate queries and the output of the encoder as inputs to associate current target candidate queries with historical trajectories. We extensively evaluate our trackers, MFDSTrack-SD, MFDSTrack-OS, MFDSTrack-GRM, and MFDSTrack-LT on the LaSOT, LaSOText, TrackingNet, GOT-10k, UAV123, NFS, and OTB100 benchmark. Extensive experiments show that our methods outperform previous state-of-the-art trackers on seven tracking benchmarks.

Original languageEnglish
Pages (from-to)2556-2569
Number of pages14
JournalIEEE Transactions on Circuits and Systems for Video Technology
Volume35
Issue number3
DOIs
StatePublished - 2025

Keywords

  • Visual object tracking
  • multi-frame distractor
  • temporal and distractor-aware transformer

Fingerprint

Dive into the research topics of 'Visual Object Tracking with Multi-Frame Distractor Suppression'. Together they form a unique fingerprint.

Cite this