Online object tracking based on CNN with spatial-temporal saliency guided sampling

Peng Zhang; Tao Zhuo; Wei Huang; Kangli Chen; Mohan Kankanhalli

doi:10.1016/j.neucom.2016.10.073

Online object tracking based on CNN with spatial-temporal saliency guided sampling

Peng Zhang, Tao Zhuo, Wei Huang, Kangli Chen, Mohan Kankanhalli

School of Computer Science

Research output: Contribution to journal › Article › peer-review

59 Scopus citations

Abstract

Arbitrary tracking is hard due to nonstop intrinsic and extrinsic variations in realistic scenarios. Even for the popular tracking-by-learning strategies, effective appearance modeling of the non-rigid objects is still challenging because of the targets’ articulatory deformations on-the-fly, which may heavily degrade the discriminative capability of the online generated visual features. With widely emerged deep learning showing its success for feature extraction in different recognition tasks, more and more deep models such as CNN have been demonstrated contributive to improving the performance of online tracking. However, only depending on the outputs from last layer of CNN is not an optimum representation since the coarse spatial resolution cannot guarantee an accurate localization for a qualified sampling process, especially when objects have severe deformations, sampling from the region with a pre-defined scale would inevitably guide a poor online learning. To overcome such a limitation of CNN based tracking, in this work, we incorporated spatial-temporal saliency detection to guide a more accurate target localization for qualified sampling within an inter-frame motion flow map. With an optional strategy for the output combination of intra-frame appearance correlations and inter-frame motion saliency based on a compositional energy optimization, the proposed tracking has shown a superior performance in comparison to the other state-of-art trackers on both challenging non-rigid and generic tracking benchmark datasets.

Original language	English
Pages (from-to)	115-127
Number of pages	13
Journal	Neurocomputing
Volume	257
DOIs	https://doi.org/10.1016/j.neucom.2016.10.073
State	Published - 27 Sep 2017

Keywords

CNN
Saliency
Sampling
Spatial-temporal
Tracking

Access to Document

10.1016/j.neucom.2016.10.073

Cite this

@article{68c57c60d6804a29bdcbdb4877af2e4f,

title = "Online object tracking based on CNN with spatial-temporal saliency guided sampling",

abstract = "Arbitrary tracking is hard due to nonstop intrinsic and extrinsic variations in realistic scenarios. Even for the popular tracking-by-learning strategies, effective appearance modeling of the non-rigid objects is still challenging because of the targets{\textquoteright} articulatory deformations on-the-fly, which may heavily degrade the discriminative capability of the online generated visual features. With widely emerged deep learning showing its success for feature extraction in different recognition tasks, more and more deep models such as CNN have been demonstrated contributive to improving the performance of online tracking. However, only depending on the outputs from last layer of CNN is not an optimum representation since the coarse spatial resolution cannot guarantee an accurate localization for a qualified sampling process, especially when objects have severe deformations, sampling from the region with a pre-defined scale would inevitably guide a poor online learning. To overcome such a limitation of CNN based tracking, in this work, we incorporated spatial-temporal saliency detection to guide a more accurate target localization for qualified sampling within an inter-frame motion flow map. With an optional strategy for the output combination of intra-frame appearance correlations and inter-frame motion saliency based on a compositional energy optimization, the proposed tracking has shown a superior performance in comparison to the other state-of-art trackers on both challenging non-rigid and generic tracking benchmark datasets.",

keywords = "CNN, Saliency, Sampling, Spatial-temporal, Tracking",

author = "Peng Zhang and Tao Zhuo and Wei Huang and Kangli Chen and Mohan Kankanhalli",

note = "Publisher Copyright: {\textcopyright} 2017 Elsevier B.V.",

year = "2017",

month = sep,

day = "27",

doi = "10.1016/j.neucom.2016.10.073",

language = "英语",

volume = "257",

pages = "115--127",

journal = "Neurocomputing",

issn = "0925-2312",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Online object tracking based on CNN with spatial-temporal saliency guided sampling

AU - Zhang, Peng

AU - Zhuo, Tao

AU - Huang, Wei

AU - Chen, Kangli

AU - Kankanhalli, Mohan

PY - 2017/9/27

Y1 - 2017/9/27

N2 - Arbitrary tracking is hard due to nonstop intrinsic and extrinsic variations in realistic scenarios. Even for the popular tracking-by-learning strategies, effective appearance modeling of the non-rigid objects is still challenging because of the targets’ articulatory deformations on-the-fly, which may heavily degrade the discriminative capability of the online generated visual features. With widely emerged deep learning showing its success for feature extraction in different recognition tasks, more and more deep models such as CNN have been demonstrated contributive to improving the performance of online tracking. However, only depending on the outputs from last layer of CNN is not an optimum representation since the coarse spatial resolution cannot guarantee an accurate localization for a qualified sampling process, especially when objects have severe deformations, sampling from the region with a pre-defined scale would inevitably guide a poor online learning. To overcome such a limitation of CNN based tracking, in this work, we incorporated spatial-temporal saliency detection to guide a more accurate target localization for qualified sampling within an inter-frame motion flow map. With an optional strategy for the output combination of intra-frame appearance correlations and inter-frame motion saliency based on a compositional energy optimization, the proposed tracking has shown a superior performance in comparison to the other state-of-art trackers on both challenging non-rigid and generic tracking benchmark datasets.

AB - Arbitrary tracking is hard due to nonstop intrinsic and extrinsic variations in realistic scenarios. Even for the popular tracking-by-learning strategies, effective appearance modeling of the non-rigid objects is still challenging because of the targets’ articulatory deformations on-the-fly, which may heavily degrade the discriminative capability of the online generated visual features. With widely emerged deep learning showing its success for feature extraction in different recognition tasks, more and more deep models such as CNN have been demonstrated contributive to improving the performance of online tracking. However, only depending on the outputs from last layer of CNN is not an optimum representation since the coarse spatial resolution cannot guarantee an accurate localization for a qualified sampling process, especially when objects have severe deformations, sampling from the region with a pre-defined scale would inevitably guide a poor online learning. To overcome such a limitation of CNN based tracking, in this work, we incorporated spatial-temporal saliency detection to guide a more accurate target localization for qualified sampling within an inter-frame motion flow map. With an optional strategy for the output combination of intra-frame appearance correlations and inter-frame motion saliency based on a compositional energy optimization, the proposed tracking has shown a superior performance in comparison to the other state-of-art trackers on both challenging non-rigid and generic tracking benchmark datasets.

KW - CNN

KW - Saliency

KW - Sampling

KW - Spatial-temporal

KW - Tracking

UR - http://www.scopus.com/inward/record.url?scp=85012922935&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2016.10.073

DO - 10.1016/j.neucom.2016.10.073

M3 - 文章

AN - SCOPUS:85012922935

SN - 0925-2312

VL - 257

SP - 115

EP - 127

JO - Neurocomputing

JF - Neurocomputing

ER -

Online object tracking based on CNN with spatial-temporal saliency guided sampling

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this