User-Ranking Video Summarization with Multi-Stage Spatio-Temporal Representation

Siyu Huang, Xi Li, Zhongfei Zhang, Fei Wu, Junwei Han

Research output: Contribution to journalArticlepeer-review

49 Scopus citations

Abstract

Video summarization is a challenging task, mainly due to the difficulties in learning complicated semantic structural relations between videos and summaries. In this paper, we present a novel supervised video summarization scheme based on three-stage deep neural networks. The scheme takes a divide-And-conquer strategy to resolve the complicated task of 3D video summarization into a set of easy and flexible computational subtasks, and then to sequentially perform 2D CNNs, 1D CNNs, and long short-Term memory to address the subtasks in an hierarchical fashion. The hierarchical modeling of spatio-Temporal structure leads to high performance and efficiency. In addition, we propose a simple but effective user-ranking method to cope with the labeling subjectivity problem of user-created video summarization, leading to the labeling quality refinement for robust supervised learning. Experimental results show that our approach outperforms the state-of-The-Art video summarization methods on two benchmark datasets.

Original languageEnglish
Article number8585041
Pages (from-to)2654-2664
Number of pages11
JournalIEEE Transactions on Image Processing
Volume28
Issue number6
DOIs
StatePublished - Jun 2019

Keywords

  • convolutional neural network
  • multi-user inconsistency
  • recurrent neural network
  • user ranking
  • Video summarization

Fingerprint

Dive into the research topics of 'User-Ranking Video Summarization with Multi-Stage Spatio-Temporal Representation'. Together they form a unique fingerprint.

Cite this