HSA-RNN: Hierarchical Structure-Adaptive RNN for Video Summarization

Bin Zhao; Xuelong Li; Xiaoqiang Lu

doi:10.1109/CVPR.2018.00773

HSA-RNN: Hierarchical Structure-Adaptive RNN for Video Summarization

Bin Zhao, Xuelong Li, Xiaoqiang Lu

School of Artificial Intelligence, OPtics and Electronics

CAS - Xi'an Institute of Optics and Precision Mechanics

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

205 Scopus citations

Abstract

Although video summarization has achieved great success in recent years, few approaches have realized the influence of video structure on the summarization results. As we know, the video data follow a hierarchical structure, i.e., a video is composed of shots, and a shot is composed of several frames. Generally, shots provide the activity-level information for people to understand the video content. While few existing summarization approaches pay attention to the shot segmentation procedure. They generate shots by some trivial strategies, such as fixed length segmentation, which may destroy the underlying hierarchical structure of video data and further reduce the quality of generated summaries. To address this problem, we propose a structure-adaptive video summarization approach that integrates shot segmentation and video summarization into a Hierarchical Structure-Adaptive RNN, denoted as HSA-RNN. We evaluate the proposed approach on four popular datasets, i.e., SumMe, TVsum, CoSum and VTW. The experimental results have demonstrated the effectiveness of HSA-RNN in the video summarization task.

Original language	English
Title of host publication	Proceedings - 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018
Publisher	IEEE Computer Society
Pages	7405-7414
Number of pages	10
ISBN (Electronic)	9781538664209
DOIs	https://doi.org/10.1109/CVPR.2018.00773
State	Published - 14 Dec 2018
Event	31st Meeting of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018 - Salt Lake City, United States Duration: 18 Jun 2018 → 22 Jun 2018

Publication series

Name	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
ISSN (Print)	1063-6919

Conference

Conference	31st Meeting of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018
Country/Territory	United States
City	Salt Lake City
Period	18/06/18 → 22/06/18

Access to Document

10.1109/CVPR.2018.00773

Cite this

Zhao, B., Li, X., & Lu, X. (2018). HSA-RNN: Hierarchical Structure-Adaptive RNN for Video Summarization. In Proceedings - 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018 (pp. 7405-7414). Article 8578871 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition). IEEE Computer Society. https://doi.org/10.1109/CVPR.2018.00773

@inproceedings{7943be1db87f4b978501c7143b71c42a,

title = "HSA-RNN: Hierarchical Structure-Adaptive RNN for Video Summarization",

abstract = "Although video summarization has achieved great success in recent years, few approaches have realized the influence of video structure on the summarization results. As we know, the video data follow a hierarchical structure, i.e., a video is composed of shots, and a shot is composed of several frames. Generally, shots provide the activity-level information for people to understand the video content. While few existing summarization approaches pay attention to the shot segmentation procedure. They generate shots by some trivial strategies, such as fixed length segmentation, which may destroy the underlying hierarchical structure of video data and further reduce the quality of generated summaries. To address this problem, we propose a structure-adaptive video summarization approach that integrates shot segmentation and video summarization into a Hierarchical Structure-Adaptive RNN, denoted as HSA-RNN. We evaluate the proposed approach on four popular datasets, i.e., SumMe, TVsum, CoSum and VTW. The experimental results have demonstrated the effectiveness of HSA-RNN in the video summarization task.",

author = "Bin Zhao and Xuelong Li and Xiaoqiang Lu",

note = "Publisher Copyright: {\textcopyright} 2018 IEEE.; 31st Meeting of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018 ; Conference date: 18-06-2018 Through 22-06-2018",

year = "2018",

month = dec,

day = "14",

doi = "10.1109/CVPR.2018.00773",

language = "英语",

series = "Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition",

publisher = "IEEE Computer Society",

pages = "7405--7414",

booktitle = "Proceedings - 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018",

}

Zhao, B, Li, X & Lu, X 2018, HSA-RNN: Hierarchical Structure-Adaptive RNN for Video Summarization. in Proceedings - 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018., 8578871, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE Computer Society, pp. 7405-7414, 31st Meeting of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, United States, 18/06/18. https://doi.org/10.1109/CVPR.2018.00773

HSA-RNN: Hierarchical Structure-Adaptive RNN for Video Summarization. / Zhao, Bin; Li, Xuelong; Lu, Xiaoqiang.
Proceedings - 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018. IEEE Computer Society, 2018. p. 7405-7414 8578871 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - HSA-RNN

T2 - 31st Meeting of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018

AU - Zhao, Bin

AU - Li, Xuelong

AU - Lu, Xiaoqiang

PY - 2018/12/14

Y1 - 2018/12/14

N2 - Although video summarization has achieved great success in recent years, few approaches have realized the influence of video structure on the summarization results. As we know, the video data follow a hierarchical structure, i.e., a video is composed of shots, and a shot is composed of several frames. Generally, shots provide the activity-level information for people to understand the video content. While few existing summarization approaches pay attention to the shot segmentation procedure. They generate shots by some trivial strategies, such as fixed length segmentation, which may destroy the underlying hierarchical structure of video data and further reduce the quality of generated summaries. To address this problem, we propose a structure-adaptive video summarization approach that integrates shot segmentation and video summarization into a Hierarchical Structure-Adaptive RNN, denoted as HSA-RNN. We evaluate the proposed approach on four popular datasets, i.e., SumMe, TVsum, CoSum and VTW. The experimental results have demonstrated the effectiveness of HSA-RNN in the video summarization task.

AB - Although video summarization has achieved great success in recent years, few approaches have realized the influence of video structure on the summarization results. As we know, the video data follow a hierarchical structure, i.e., a video is composed of shots, and a shot is composed of several frames. Generally, shots provide the activity-level information for people to understand the video content. While few existing summarization approaches pay attention to the shot segmentation procedure. They generate shots by some trivial strategies, such as fixed length segmentation, which may destroy the underlying hierarchical structure of video data and further reduce the quality of generated summaries. To address this problem, we propose a structure-adaptive video summarization approach that integrates shot segmentation and video summarization into a Hierarchical Structure-Adaptive RNN, denoted as HSA-RNN. We evaluate the proposed approach on four popular datasets, i.e., SumMe, TVsum, CoSum and VTW. The experimental results have demonstrated the effectiveness of HSA-RNN in the video summarization task.

UR - http://www.scopus.com/inward/record.url?scp=85062862552&partnerID=8YFLogxK

U2 - 10.1109/CVPR.2018.00773

DO - 10.1109/CVPR.2018.00773

M3 - 会议稿件

AN - SCOPUS:85062862552

T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

SP - 7405

EP - 7414

BT - Proceedings - 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018

PB - IEEE Computer Society

Y2 - 18 June 2018 through 22 June 2018

ER -

HSA-RNN: Hierarchical Structure-Adaptive RNN for Video Summarization

Abstract

Publication series

Conference

Access to Document

Other files and links

Fingerprint

Cite this