Going deeper with two-stream ConvNets for action recognition in video surveillance

Yamin Han, Peng Zhang, Tao Zhuo, Wei Huang, Yanning Zhang

Research output: Contribution to journalArticlepeer-review

74 Scopus citations

Abstract

Learning by deep convolutional networks have shown an outstanding effectiveness in a variety of vision based classification tasks, and for which, large datasets are the prerequisites to guarantee its high performance. But in many realistic circumstances, using a massive quantity of training samples to achieve more sophisticated analysis is hard to be fulfilled always, such as human action recognition in videos, and the resulting problem of data deficiency, especially for the labeled data, would critically limit the deeper model structure as a promising solution due to its high risk of overfitting. Additionally, in lacking of high modeling capacity constrained by of model depth, the high-level visual cues like object interaction, scene context and pose variations concurrent with human action also could become the extrinsic and intrinsic challenges for the traditional deep convolutional networks. For the limitations above, in this paper, we proposed a strategy of dataset remodeling by transferring parameters of ResNet-101 layers trained on the ImageNet dataset to initialize learning model and adopt an augmented data variation approach to overcome the overfitting challenge of sample deficiency. For model structure improvement, a novel deeper two-stream ConvNets has been designed for the learning of action complexity. With a dis-order strategy of training/testing video sets, the proposed model and learning strategy are able to collaboratively achieve a significant improvement of action recognition. Experiments on two challenging datasets UCF101 and KTH have verified a superior performance in comparison with other state-of-the-art methods.

Original languageEnglish
Pages (from-to)83-90
Number of pages8
JournalPattern Recognition Letters
Volume107
DOIs
StatePublished - 1 May 2018

Keywords

  • Action recognition
  • ConvNets
  • Deeper
  • Two-stream
  • Video surveillance

Fingerprint

Dive into the research topics of 'Going deeper with two-stream ConvNets for action recognition in video surveillance'. Together they form a unique fingerprint.

Cite this