User-Ranking Video Summarization with Multi-Stage Spatio-Temporal Representation

Siyu Huang; Xi Li; Zhongfei Zhang; Fei Wu; Junwei Han

doi:10.1109/TIP.2018.2889265

User-Ranking Video Summarization with Multi-Stage Spatio-Temporal Representation

Siyu Huang, Xi Li, Zhongfei Zhang, Fei Wu, Junwei Han

自动化学院

科研成果: 期刊稿件 › 文章 › 同行评审

49 引用（Scopus）

摘要

Video summarization is a challenging task, mainly due to the difficulties in learning complicated semantic structural relations between videos and summaries. In this paper, we present a novel supervised video summarization scheme based on three-stage deep neural networks. The scheme takes a divide-And-conquer strategy to resolve the complicated task of 3D video summarization into a set of easy and flexible computational subtasks, and then to sequentially perform 2D CNNs, 1D CNNs, and long short-Term memory to address the subtasks in an hierarchical fashion. The hierarchical modeling of spatio-Temporal structure leads to high performance and efficiency. In addition, we propose a simple but effective user-ranking method to cope with the labeling subjectivity problem of user-created video summarization, leading to the labeling quality refinement for robust supervised learning. Experimental results show that our approach outperforms the state-of-The-Art video summarization methods on two benchmark datasets.

源语言	英语
文章编号	8585041
页（从-至）	2654-2664
页数	11
期刊	IEEE Transactions on Image Processing
卷	28
期	6
DOI	https://doi.org/10.1109/TIP.2018.2889265
出版状态	已出版 - 6月 2019

访问文件

10.1109/TIP.2018.2889265

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{e5cf79486bce4760b1dcea3ffd793460,

title = "User-Ranking Video Summarization with Multi-Stage Spatio-Temporal Representation",

abstract = "Video summarization is a challenging task, mainly due to the difficulties in learning complicated semantic structural relations between videos and summaries. In this paper, we present a novel supervised video summarization scheme based on three-stage deep neural networks. The scheme takes a divide-And-conquer strategy to resolve the complicated task of 3D video summarization into a set of easy and flexible computational subtasks, and then to sequentially perform 2D CNNs, 1D CNNs, and long short-Term memory to address the subtasks in an hierarchical fashion. The hierarchical modeling of spatio-Temporal structure leads to high performance and efficiency. In addition, we propose a simple but effective user-ranking method to cope with the labeling subjectivity problem of user-created video summarization, leading to the labeling quality refinement for robust supervised learning. Experimental results show that our approach outperforms the state-of-The-Art video summarization methods on two benchmark datasets.",

keywords = "convolutional neural network, multi-user inconsistency, recurrent neural network, user ranking, Video summarization",

author = "Siyu Huang and Xi Li and Zhongfei Zhang and Fei Wu and Junwei Han",

note = "Publisher Copyright: {\textcopyright} 1992-2012 IEEE.",

year = "2019",

month = jun,

doi = "10.1109/TIP.2018.2889265",

language = "英语",

volume = "28",

pages = "2654--2664",

journal = "IEEE Transactions on Image Processing",

issn = "1057-7149",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "6",

}

TY - JOUR

T1 - User-Ranking Video Summarization with Multi-Stage Spatio-Temporal Representation

AU - Huang, Siyu

AU - Li, Xi

AU - Zhang, Zhongfei

AU - Wu, Fei

AU - Han, Junwei

PY - 2019/6

Y1 - 2019/6

N2 - Video summarization is a challenging task, mainly due to the difficulties in learning complicated semantic structural relations between videos and summaries. In this paper, we present a novel supervised video summarization scheme based on three-stage deep neural networks. The scheme takes a divide-And-conquer strategy to resolve the complicated task of 3D video summarization into a set of easy and flexible computational subtasks, and then to sequentially perform 2D CNNs, 1D CNNs, and long short-Term memory to address the subtasks in an hierarchical fashion. The hierarchical modeling of spatio-Temporal structure leads to high performance and efficiency. In addition, we propose a simple but effective user-ranking method to cope with the labeling subjectivity problem of user-created video summarization, leading to the labeling quality refinement for robust supervised learning. Experimental results show that our approach outperforms the state-of-The-Art video summarization methods on two benchmark datasets.

AB - Video summarization is a challenging task, mainly due to the difficulties in learning complicated semantic structural relations between videos and summaries. In this paper, we present a novel supervised video summarization scheme based on three-stage deep neural networks. The scheme takes a divide-And-conquer strategy to resolve the complicated task of 3D video summarization into a set of easy and flexible computational subtasks, and then to sequentially perform 2D CNNs, 1D CNNs, and long short-Term memory to address the subtasks in an hierarchical fashion. The hierarchical modeling of spatio-Temporal structure leads to high performance and efficiency. In addition, we propose a simple but effective user-ranking method to cope with the labeling subjectivity problem of user-created video summarization, leading to the labeling quality refinement for robust supervised learning. Experimental results show that our approach outperforms the state-of-The-Art video summarization methods on two benchmark datasets.

KW - convolutional neural network

KW - multi-user inconsistency

KW - recurrent neural network

KW - user ranking

KW - Video summarization

UR - http://www.scopus.com/inward/record.url?scp=85058992212&partnerID=8YFLogxK

U2 - 10.1109/TIP.2018.2889265

DO - 10.1109/TIP.2018.2889265

M3 - 文章

AN - SCOPUS:85058992212

SN - 1057-7149

VL - 28

SP - 2654

EP - 2664

JO - IEEE Transactions on Image Processing

JF - IEEE Transactions on Image Processing

IS - 6

M1 - 8585041

ER -

User-Ranking Video Summarization with Multi-Stage Spatio-Temporal Representation

摘要

访问文件

其它文件与链接

指纹

引用此