User-Ranking Video Summarization with Multi-Stage Spatio-Temporal Representation

Siyu Huang; Xi Li; Zhongfei Zhang; Fei Wu; Junwei Han

doi:10.1109/TIP.2018.2889265

User-Ranking Video Summarization with Multi-Stage Spatio-Temporal Representation

Siyu Huang, Xi Li, Zhongfei Zhang, Fei Wu, Junwei Han

School of Automation

Research output: Contribution to journal › Article › peer-review

49 Scopus citations

Abstract

Video summarization is a challenging task, mainly due to the difficulties in learning complicated semantic structural relations between videos and summaries. In this paper, we present a novel supervised video summarization scheme based on three-stage deep neural networks. The scheme takes a divide-And-conquer strategy to resolve the complicated task of 3D video summarization into a set of easy and flexible computational subtasks, and then to sequentially perform 2D CNNs, 1D CNNs, and long short-Term memory to address the subtasks in an hierarchical fashion. The hierarchical modeling of spatio-Temporal structure leads to high performance and efficiency. In addition, we propose a simple but effective user-ranking method to cope with the labeling subjectivity problem of user-created video summarization, leading to the labeling quality refinement for robust supervised learning. Experimental results show that our approach outperforms the state-of-The-Art video summarization methods on two benchmark datasets.

Original language	English
Article number	8585041
Pages (from-to)	2654-2664
Number of pages	11
Journal	IEEE Transactions on Image Processing
Volume	28
Issue number	6
DOIs	https://doi.org/10.1109/TIP.2018.2889265
State	Published - Jun 2019

Keywords

convolutional neural network
multi-user inconsistency
recurrent neural network
user ranking
Video summarization

Access to Document

10.1109/TIP.2018.2889265

Cite this

@article{e5cf79486bce4760b1dcea3ffd793460,

title = "User-Ranking Video Summarization with Multi-Stage Spatio-Temporal Representation",

abstract = "Video summarization is a challenging task, mainly due to the difficulties in learning complicated semantic structural relations between videos and summaries. In this paper, we present a novel supervised video summarization scheme based on three-stage deep neural networks. The scheme takes a divide-And-conquer strategy to resolve the complicated task of 3D video summarization into a set of easy and flexible computational subtasks, and then to sequentially perform 2D CNNs, 1D CNNs, and long short-Term memory to address the subtasks in an hierarchical fashion. The hierarchical modeling of spatio-Temporal structure leads to high performance and efficiency. In addition, we propose a simple but effective user-ranking method to cope with the labeling subjectivity problem of user-created video summarization, leading to the labeling quality refinement for robust supervised learning. Experimental results show that our approach outperforms the state-of-The-Art video summarization methods on two benchmark datasets.",

keywords = "convolutional neural network, multi-user inconsistency, recurrent neural network, user ranking, Video summarization",

author = "Siyu Huang and Xi Li and Zhongfei Zhang and Fei Wu and Junwei Han",

note = "Publisher Copyright: {\textcopyright} 1992-2012 IEEE.",

year = "2019",

month = jun,

doi = "10.1109/TIP.2018.2889265",

language = "英语",

volume = "28",

pages = "2654--2664",

journal = "IEEE Transactions on Image Processing",

issn = "1057-7149",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "6",

}

TY - JOUR

T1 - User-Ranking Video Summarization with Multi-Stage Spatio-Temporal Representation

AU - Huang, Siyu

AU - Li, Xi

AU - Zhang, Zhongfei

AU - Wu, Fei

AU - Han, Junwei

PY - 2019/6

Y1 - 2019/6

N2 - Video summarization is a challenging task, mainly due to the difficulties in learning complicated semantic structural relations between videos and summaries. In this paper, we present a novel supervised video summarization scheme based on three-stage deep neural networks. The scheme takes a divide-And-conquer strategy to resolve the complicated task of 3D video summarization into a set of easy and flexible computational subtasks, and then to sequentially perform 2D CNNs, 1D CNNs, and long short-Term memory to address the subtasks in an hierarchical fashion. The hierarchical modeling of spatio-Temporal structure leads to high performance and efficiency. In addition, we propose a simple but effective user-ranking method to cope with the labeling subjectivity problem of user-created video summarization, leading to the labeling quality refinement for robust supervised learning. Experimental results show that our approach outperforms the state-of-The-Art video summarization methods on two benchmark datasets.

AB - Video summarization is a challenging task, mainly due to the difficulties in learning complicated semantic structural relations between videos and summaries. In this paper, we present a novel supervised video summarization scheme based on three-stage deep neural networks. The scheme takes a divide-And-conquer strategy to resolve the complicated task of 3D video summarization into a set of easy and flexible computational subtasks, and then to sequentially perform 2D CNNs, 1D CNNs, and long short-Term memory to address the subtasks in an hierarchical fashion. The hierarchical modeling of spatio-Temporal structure leads to high performance and efficiency. In addition, we propose a simple but effective user-ranking method to cope with the labeling subjectivity problem of user-created video summarization, leading to the labeling quality refinement for robust supervised learning. Experimental results show that our approach outperforms the state-of-The-Art video summarization methods on two benchmark datasets.

KW - convolutional neural network

KW - multi-user inconsistency

KW - recurrent neural network

KW - user ranking

KW - Video summarization

UR - http://www.scopus.com/inward/record.url?scp=85058992212&partnerID=8YFLogxK

U2 - 10.1109/TIP.2018.2889265

DO - 10.1109/TIP.2018.2889265

M3 - 文章

AN - SCOPUS:85058992212

SN - 1057-7149

VL - 28

SP - 2654

EP - 2664

JO - IEEE Transactions on Image Processing

JF - IEEE Transactions on Image Processing

IS - 6

M1 - 8585041

ER -

User-Ranking Video Summarization with Multi-Stage Spatio-Temporal Representation

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this