Online object tracking based on CNN with spatial-temporal saliency guided sampling

Peng Zhang, Tao Zhuo, Wei Huang, Kangli Chen, Mohan Kankanhalli

Research output: Contribution to journalArticlepeer-review

58 Scopus citations

Abstract

Arbitrary tracking is hard due to nonstop intrinsic and extrinsic variations in realistic scenarios. Even for the popular tracking-by-learning strategies, effective appearance modeling of the non-rigid objects is still challenging because of the targets’ articulatory deformations on-the-fly, which may heavily degrade the discriminative capability of the online generated visual features. With widely emerged deep learning showing its success for feature extraction in different recognition tasks, more and more deep models such as CNN have been demonstrated contributive to improving the performance of online tracking. However, only depending on the outputs from last layer of CNN is not an optimum representation since the coarse spatial resolution cannot guarantee an accurate localization for a qualified sampling process, especially when objects have severe deformations, sampling from the region with a pre-defined scale would inevitably guide a poor online learning. To overcome such a limitation of CNN based tracking, in this work, we incorporated spatial-temporal saliency detection to guide a more accurate target localization for qualified sampling within an inter-frame motion flow map. With an optional strategy for the output combination of intra-frame appearance correlations and inter-frame motion saliency based on a compositional energy optimization, the proposed tracking has shown a superior performance in comparison to the other state-of-art trackers on both challenging non-rigid and generic tracking benchmark datasets.

Original languageEnglish
Pages (from-to)115-127
Number of pages13
JournalNeurocomputing
Volume257
DOIs
StatePublished - 27 Sep 2017

Keywords

  • CNN
  • Saliency
  • Sampling
  • Spatial-temporal
  • Tracking

Fingerprint

Dive into the research topics of 'Online object tracking based on CNN with spatial-temporal saliency guided sampling'. Together they form a unique fingerprint.

Cite this