Hierarchical recurrent neural network for video summarization

Bin Zhao; Xuelong Li; Xiaoqiang Lu

doi:10.1145/3123266.3123328

Hierarchical recurrent neural network for video summarization

Bin Zhao, Xuelong Li, Xiaoqiang Lu

School of Artificial Intelligence, OPtics and Electronics

CAS - Xi'an Institute of Optics and Precision Mechanics

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

162 Scopus citations

Abstract

Exploiting the temporal dependency among video frames or subshots is very important for the task of video summarization. Practically, RNN is good at temporal dependency modeling, and has achieved overwhelming performance in many video-based tasks, such as video captioning and classification. However, RNN is not capable enough to handle the video summarization task, since traditional RNNs, including LSTM, can only deal with short videos, while the videos in the summarization task are usually in longer duration. To address this problem, we propose a hierarchical recurrent neural network for video summarization, called H-RNN in this paper. Specifically, it has two layers, where the first layer is utilized to encode short video subshots cut from the original video, and the final hidden state of each subshot is input to the second layer for calculating its confidence to be a key subshot. Compared to traditional RNNs, H-RNN is more suitable to video summarization, since it can exploit long temporal dependency among frames, meanwhile, the computation operations are significantly lessened. The results on two popular datasets, including the Combined dataset and VTW dataset, have demonstrated that the proposed H-RNN outperforms the state-of-the-arts.

Original language	English
Title of host publication	MM 2017 - Proceedings of the 2017 ACM Multimedia Conference
Publisher	Association for Computing Machinery, Inc
Pages	863-871
Number of pages	9
ISBN (Electronic)	9781450349062
DOIs	https://doi.org/10.1145/3123266.3123328
State	Published - 23 Oct 2017
Event	25th ACM International Conference on Multimedia, MM 2017 - Mountain View, United States Duration: 23 Oct 2017 → 27 Oct 2017

Publication series

Name	MM 2017 - Proceedings of the 2017 ACM Multimedia Conference

Conference

Conference	25th ACM International Conference on Multimedia, MM 2017
Country/Territory	United States
City	Mountain View
Period	23/10/17 → 27/10/17

Keywords

Deep learning
Hierarchical recurrent neural network
Video summarization

Access to Document

10.1145/3123266.3123328

Cite this

@inproceedings{682e7a3bd21947aba47f81a3d07ef9a5,

title = "Hierarchical recurrent neural network for video summarization",

abstract = "Exploiting the temporal dependency among video frames or subshots is very important for the task of video summarization. Practically, RNN is good at temporal dependency modeling, and has achieved overwhelming performance in many video-based tasks, such as video captioning and classification. However, RNN is not capable enough to handle the video summarization task, since traditional RNNs, including LSTM, can only deal with short videos, while the videos in the summarization task are usually in longer duration. To address this problem, we propose a hierarchical recurrent neural network for video summarization, called H-RNN in this paper. Specifically, it has two layers, where the first layer is utilized to encode short video subshots cut from the original video, and the final hidden state of each subshot is input to the second layer for calculating its confidence to be a key subshot. Compared to traditional RNNs, H-RNN is more suitable to video summarization, since it can exploit long temporal dependency among frames, meanwhile, the computation operations are significantly lessened. The results on two popular datasets, including the Combined dataset and VTW dataset, have demonstrated that the proposed H-RNN outperforms the state-of-the-arts.",

keywords = "Deep learning, Hierarchical recurrent neural network, Video summarization",

author = "Bin Zhao and Xuelong Li and Xiaoqiang Lu",

note = "Publisher Copyright: {\textcopyright} 2017 ACM.; 25th ACM International Conference on Multimedia, MM 2017 ; Conference date: 23-10-2017 Through 27-10-2017",

year = "2017",

month = oct,

day = "23",

doi = "10.1145/3123266.3123328",

language = "英语",

series = "MM 2017 - Proceedings of the 2017 ACM Multimedia Conference",

publisher = "Association for Computing Machinery, Inc",

pages = "863--871",

booktitle = "MM 2017 - Proceedings of the 2017 ACM Multimedia Conference",

}

Zhao, B, Li, X & Lu, X 2017, Hierarchical recurrent neural network for video summarization. in MM 2017 - Proceedings of the 2017 ACM Multimedia Conference. MM 2017 - Proceedings of the 2017 ACM Multimedia Conference, Association for Computing Machinery, Inc, pp. 863-871, 25th ACM International Conference on Multimedia, MM 2017, Mountain View, United States, 23/10/17. https://doi.org/10.1145/3123266.3123328

Hierarchical recurrent neural network for video summarization. / Zhao, Bin; Li, Xuelong; Lu, Xiaoqiang.
MM 2017 - Proceedings of the 2017 ACM Multimedia Conference. Association for Computing Machinery, Inc, 2017. p. 863-871 (MM 2017 - Proceedings of the 2017 ACM Multimedia Conference).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Hierarchical recurrent neural network for video summarization

AU - Zhao, Bin

AU - Li, Xuelong

AU - Lu, Xiaoqiang

PY - 2017/10/23

Y1 - 2017/10/23

N2 - Exploiting the temporal dependency among video frames or subshots is very important for the task of video summarization. Practically, RNN is good at temporal dependency modeling, and has achieved overwhelming performance in many video-based tasks, such as video captioning and classification. However, RNN is not capable enough to handle the video summarization task, since traditional RNNs, including LSTM, can only deal with short videos, while the videos in the summarization task are usually in longer duration. To address this problem, we propose a hierarchical recurrent neural network for video summarization, called H-RNN in this paper. Specifically, it has two layers, where the first layer is utilized to encode short video subshots cut from the original video, and the final hidden state of each subshot is input to the second layer for calculating its confidence to be a key subshot. Compared to traditional RNNs, H-RNN is more suitable to video summarization, since it can exploit long temporal dependency among frames, meanwhile, the computation operations are significantly lessened. The results on two popular datasets, including the Combined dataset and VTW dataset, have demonstrated that the proposed H-RNN outperforms the state-of-the-arts.

AB - Exploiting the temporal dependency among video frames or subshots is very important for the task of video summarization. Practically, RNN is good at temporal dependency modeling, and has achieved overwhelming performance in many video-based tasks, such as video captioning and classification. However, RNN is not capable enough to handle the video summarization task, since traditional RNNs, including LSTM, can only deal with short videos, while the videos in the summarization task are usually in longer duration. To address this problem, we propose a hierarchical recurrent neural network for video summarization, called H-RNN in this paper. Specifically, it has two layers, where the first layer is utilized to encode short video subshots cut from the original video, and the final hidden state of each subshot is input to the second layer for calculating its confidence to be a key subshot. Compared to traditional RNNs, H-RNN is more suitable to video summarization, since it can exploit long temporal dependency among frames, meanwhile, the computation operations are significantly lessened. The results on two popular datasets, including the Combined dataset and VTW dataset, have demonstrated that the proposed H-RNN outperforms the state-of-the-arts.

KW - Deep learning

KW - Hierarchical recurrent neural network

KW - Video summarization

UR - http://www.scopus.com/inward/record.url?scp=85035233078&partnerID=8YFLogxK

U2 - 10.1145/3123266.3123328

DO - 10.1145/3123266.3123328

M3 - 会议稿件

AN - SCOPUS:85035233078

T3 - MM 2017 - Proceedings of the 2017 ACM Multimedia Conference

SP - 863

EP - 871

BT - MM 2017 - Proceedings of the 2017 ACM Multimedia Conference

PB - Association for Computing Machinery, Inc

T2 - 25th ACM International Conference on Multimedia, MM 2017

Y2 - 23 October 2017 through 27 October 2017

ER -

Hierarchical recurrent neural network for video summarization

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this