Hierarchical recurrent neural network for video summarization

Bin Zhao; Xuelong Li; Xiaoqiang Lu

doi:10.1145/3123266.3123328

Hierarchical recurrent neural network for video summarization

Bin Zhao, Xuelong Li, Xiaoqiang Lu

光电与智能研究院

CAS - Xi'an Institute of Optics and Precision Mechanics

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

162 引用（Scopus）

摘要

Exploiting the temporal dependency among video frames or subshots is very important for the task of video summarization. Practically, RNN is good at temporal dependency modeling, and has achieved overwhelming performance in many video-based tasks, such as video captioning and classification. However, RNN is not capable enough to handle the video summarization task, since traditional RNNs, including LSTM, can only deal with short videos, while the videos in the summarization task are usually in longer duration. To address this problem, we propose a hierarchical recurrent neural network for video summarization, called H-RNN in this paper. Specifically, it has two layers, where the first layer is utilized to encode short video subshots cut from the original video, and the final hidden state of each subshot is input to the second layer for calculating its confidence to be a key subshot. Compared to traditional RNNs, H-RNN is more suitable to video summarization, since it can exploit long temporal dependency among frames, meanwhile, the computation operations are significantly lessened. The results on two popular datasets, including the Combined dataset and VTW dataset, have demonstrated that the proposed H-RNN outperforms the state-of-the-arts.

源语言	英语
主期刊名	MM 2017 - Proceedings of the 2017 ACM Multimedia Conference
出版商	Association for Computing Machinery, Inc
页	863-871
页数	9
ISBN（电子版）	9781450349062
DOI	https://doi.org/10.1145/3123266.3123328
出版状态	已出版 - 23 10月 2017
活动	25th ACM International Conference on Multimedia, MM 2017 - Mountain View, 美国期限: 23 10月 2017 → 27 10月 2017

出版系列

姓名	MM 2017 - Proceedings of the 2017 ACM Multimedia Conference

会议

会议	25th ACM International Conference on Multimedia, MM 2017
国家/地区	美国
市	Mountain View
时期	23/10/17 → 27/10/17

访问文件

10.1145/3123266.3123328

其它文件与链接

链接到 Scopus 的出版物

引用此

@inproceedings{682e7a3bd21947aba47f81a3d07ef9a5,

title = "Hierarchical recurrent neural network for video summarization",

abstract = "Exploiting the temporal dependency among video frames or subshots is very important for the task of video summarization. Practically, RNN is good at temporal dependency modeling, and has achieved overwhelming performance in many video-based tasks, such as video captioning and classification. However, RNN is not capable enough to handle the video summarization task, since traditional RNNs, including LSTM, can only deal with short videos, while the videos in the summarization task are usually in longer duration. To address this problem, we propose a hierarchical recurrent neural network for video summarization, called H-RNN in this paper. Specifically, it has two layers, where the first layer is utilized to encode short video subshots cut from the original video, and the final hidden state of each subshot is input to the second layer for calculating its confidence to be a key subshot. Compared to traditional RNNs, H-RNN is more suitable to video summarization, since it can exploit long temporal dependency among frames, meanwhile, the computation operations are significantly lessened. The results on two popular datasets, including the Combined dataset and VTW dataset, have demonstrated that the proposed H-RNN outperforms the state-of-the-arts.",

keywords = "Deep learning, Hierarchical recurrent neural network, Video summarization",

author = "Bin Zhao and Xuelong Li and Xiaoqiang Lu",

note = "Publisher Copyright: {\textcopyright} 2017 ACM.; 25th ACM International Conference on Multimedia, MM 2017 ; Conference date: 23-10-2017 Through 27-10-2017",

year = "2017",

month = oct,

day = "23",

doi = "10.1145/3123266.3123328",

language = "英语",

series = "MM 2017 - Proceedings of the 2017 ACM Multimedia Conference",

publisher = "Association for Computing Machinery, Inc",

pages = "863--871",

booktitle = "MM 2017 - Proceedings of the 2017 ACM Multimedia Conference",

}

Zhao, B, Li, X & Lu, X 2017, Hierarchical recurrent neural network for video summarization. 在 MM 2017 - Proceedings of the 2017 ACM Multimedia Conference. MM 2017 - Proceedings of the 2017 ACM Multimedia Conference, Association for Computing Machinery, Inc, 页码 863-871, 25th ACM International Conference on Multimedia, MM 2017, Mountain View, 美国, 23/10/17. https://doi.org/10.1145/3123266.3123328

TY - GEN

T1 - Hierarchical recurrent neural network for video summarization

AU - Zhao, Bin

AU - Li, Xuelong

AU - Lu, Xiaoqiang

PY - 2017/10/23

Y1 - 2017/10/23

N2 - Exploiting the temporal dependency among video frames or subshots is very important for the task of video summarization. Practically, RNN is good at temporal dependency modeling, and has achieved overwhelming performance in many video-based tasks, such as video captioning and classification. However, RNN is not capable enough to handle the video summarization task, since traditional RNNs, including LSTM, can only deal with short videos, while the videos in the summarization task are usually in longer duration. To address this problem, we propose a hierarchical recurrent neural network for video summarization, called H-RNN in this paper. Specifically, it has two layers, where the first layer is utilized to encode short video subshots cut from the original video, and the final hidden state of each subshot is input to the second layer for calculating its confidence to be a key subshot. Compared to traditional RNNs, H-RNN is more suitable to video summarization, since it can exploit long temporal dependency among frames, meanwhile, the computation operations are significantly lessened. The results on two popular datasets, including the Combined dataset and VTW dataset, have demonstrated that the proposed H-RNN outperforms the state-of-the-arts.

AB - Exploiting the temporal dependency among video frames or subshots is very important for the task of video summarization. Practically, RNN is good at temporal dependency modeling, and has achieved overwhelming performance in many video-based tasks, such as video captioning and classification. However, RNN is not capable enough to handle the video summarization task, since traditional RNNs, including LSTM, can only deal with short videos, while the videos in the summarization task are usually in longer duration. To address this problem, we propose a hierarchical recurrent neural network for video summarization, called H-RNN in this paper. Specifically, it has two layers, where the first layer is utilized to encode short video subshots cut from the original video, and the final hidden state of each subshot is input to the second layer for calculating its confidence to be a key subshot. Compared to traditional RNNs, H-RNN is more suitable to video summarization, since it can exploit long temporal dependency among frames, meanwhile, the computation operations are significantly lessened. The results on two popular datasets, including the Combined dataset and VTW dataset, have demonstrated that the proposed H-RNN outperforms the state-of-the-arts.

KW - Deep learning

KW - Hierarchical recurrent neural network

KW - Video summarization

UR - http://www.scopus.com/inward/record.url?scp=85035233078&partnerID=8YFLogxK

U2 - 10.1145/3123266.3123328

DO - 10.1145/3123266.3123328

M3 - 会议稿件

AN - SCOPUS:85035233078

T3 - MM 2017 - Proceedings of the 2017 ACM Multimedia Conference

SP - 863

EP - 871

BT - MM 2017 - Proceedings of the 2017 ACM Multimedia Conference

PB - Association for Computing Machinery, Inc

T2 - 25th ACM International Conference on Multimedia, MM 2017

Y2 - 23 October 2017 through 27 October 2017

ER -

Hierarchical recurrent neural network for video summarization

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此