Hierarchical recurrent neural network for video summarization

Bin Zhao, Xuelong Li, Xiaoqiang Lu

科研成果: 书/报告/会议事项章节会议稿件同行评审

162 引用 (Scopus)

摘要

Exploiting the temporal dependency among video frames or subshots is very important for the task of video summarization. Practically, RNN is good at temporal dependency modeling, and has achieved overwhelming performance in many video-based tasks, such as video captioning and classification. However, RNN is not capable enough to handle the video summarization task, since traditional RNNs, including LSTM, can only deal with short videos, while the videos in the summarization task are usually in longer duration. To address this problem, we propose a hierarchical recurrent neural network for video summarization, called H-RNN in this paper. Specifically, it has two layers, where the first layer is utilized to encode short video subshots cut from the original video, and the final hidden state of each subshot is input to the second layer for calculating its confidence to be a key subshot. Compared to traditional RNNs, H-RNN is more suitable to video summarization, since it can exploit long temporal dependency among frames, meanwhile, the computation operations are significantly lessened. The results on two popular datasets, including the Combined dataset and VTW dataset, have demonstrated that the proposed H-RNN outperforms the state-of-the-arts.

源语言英语
主期刊名MM 2017 - Proceedings of the 2017 ACM Multimedia Conference
出版商Association for Computing Machinery, Inc
863-871
页数9
ISBN(电子版)9781450349062
DOI
出版状态已出版 - 23 10月 2017
活动25th ACM International Conference on Multimedia, MM 2017 - Mountain View, 美国
期限: 23 10月 201727 10月 2017

出版系列

姓名MM 2017 - Proceedings of the 2017 ACM Multimedia Conference

会议

会议25th ACM International Conference on Multimedia, MM 2017
国家/地区美国
Mountain View
时期23/10/1727/10/17

指纹

探究 'Hierarchical recurrent neural network for video summarization' 的科研主题。它们共同构成独一无二的指纹。

引用此