TY - JOUR
T1 - Robust tracking based on H-CNN with low-resource sampling and scaling by frame-wise motion localization
AU - Zhang, Peng
AU - Zhuo, Tao
AU - Huang, Hanqiao
AU - Chen, Kangli
AU - Zhang, Bo
AU - Kankanhalli, Mohan
N1 - Publisher Copyright:
© 2017, Springer Science+Business Media New York.
PY - 2018/7/1
Y1 - 2018/7/1
N2 - In big data age, learning with deep models has shown its outstanding effectiveness in a variety of vision tasks. Unfortunately, the requirement of enormous training samples and computational cost still limit its practicability in the low resource media computing based applications such online object tracking. More recently, CNN based feature extraction has helped tracking-by-learning strategies make a significant progress, although the coarse resolution outputs from the last layer still substantially limit a further improvement of tracking performance. By exploiting the hierarchies of convolutional layers as an image pyramid representation, earlier convolutional layers of hierarchical CNN have shown a certain enhancement of spatial localization but are less invariant to target appearance changes, which inevitably led to an inaccurate region for sampling when the non-rigid objects have intrinsic motion. To guarantee a qualified sampling for tracking-by-learning with hierarchical CNN, in this paper, we incorporated an inter-frame motion guidance with the intra-frame appearance correlations by formulating different energy optimization process in both spatial and temporal domains. With an optional functionality for the extracted regions combination, the proposed algorithm is able to achieve more precise target localization for qualified sampling. Experiments on challenging non-rigid tracking benchmark dataset have demonstrated a superior performance of the proposed tracking in comparison to the other state-of-art trackers.
AB - In big data age, learning with deep models has shown its outstanding effectiveness in a variety of vision tasks. Unfortunately, the requirement of enormous training samples and computational cost still limit its practicability in the low resource media computing based applications such online object tracking. More recently, CNN based feature extraction has helped tracking-by-learning strategies make a significant progress, although the coarse resolution outputs from the last layer still substantially limit a further improvement of tracking performance. By exploiting the hierarchies of convolutional layers as an image pyramid representation, earlier convolutional layers of hierarchical CNN have shown a certain enhancement of spatial localization but are less invariant to target appearance changes, which inevitably led to an inaccurate region for sampling when the non-rigid objects have intrinsic motion. To guarantee a qualified sampling for tracking-by-learning with hierarchical CNN, in this paper, we incorporated an inter-frame motion guidance with the intra-frame appearance correlations by formulating different energy optimization process in both spatial and temporal domains. With an optional functionality for the extracted regions combination, the proposed algorithm is able to achieve more precise target localization for qualified sampling. Experiments on challenging non-rigid tracking benchmark dataset have demonstrated a superior performance of the proposed tracking in comparison to the other state-of-art trackers.
KW - CNN
KW - Hierarchical
KW - Motion
KW - Online tracking
KW - Sampling
UR - http://www.scopus.com/inward/record.url?scp=85013460173&partnerID=8YFLogxK
U2 - 10.1007/s11042-017-4493-4
DO - 10.1007/s11042-017-4493-4
M3 - 文章
AN - SCOPUS:85013460173
SN - 1380-7501
VL - 77
SP - 18781
EP - 18800
JO - Multimedia Tools and Applications
JF - Multimedia Tools and Applications
IS - 14
ER -