Spatiotemporal modeling for video summarization using convolutional recurrent neural network

Yuan Yuan; Haopeng Li; Qi Wang

doi:10.1109/ACCESS.2019.2916989

Spatiotemporal modeling for video summarization using convolutional recurrent neural network

Yuan Yuan, Haopeng Li, Qi Wang

School of Artificial Intelligence, OPtics and Electronics

Northwestern Polytechnical University Xian

Research output: Contribution to journal › Article › peer-review

41 Scopus citations

Abstract

In this paper, a novel neural network named CRSum for the video summarization task is proposed. The proposed network integrates feature extraction, temporal modeling, and summary generation into an end-to-end architecture. Compared with previous work on this task, the proposed method owns three distinctive characteristics: 1) it for the first time leverages convolutional recurrent neural network for simultaneously modeling spatial and temporal structure of video for summarization; 2) thorough and delicate features of video are obtained in the proposed architecture by trainable three-dimension convolutional neural networks and feature fusion; and 3) a new loss function named Sobolev loss is defined, aiming to constrain the derivative of sequential data and exploit potential temporal structure of video. A series of experiments are conducted to prove the effectiveness of the proposed method. We further analyze our method from different aspects by well-designed experiments.

Original language	English
Article number	8715406
Pages (from-to)	64676-64685
Number of pages	10
Journal	IEEE Access
Volume	7
DOIs	https://doi.org/10.1109/ACCESS.2019.2916989
State	Published - 2019

Keywords

CRNN
CRSum
Sobolev loss
spatiotemporal modeling
video summarization

Access to Document

10.1109/ACCESS.2019.2916989

Cite this

@article{94f1f4e361964a23990dab3ad99d4689,

title = "Spatiotemporal modeling for video summarization using convolutional recurrent neural network",

abstract = "In this paper, a novel neural network named CRSum for the video summarization task is proposed. The proposed network integrates feature extraction, temporal modeling, and summary generation into an end-to-end architecture. Compared with previous work on this task, the proposed method owns three distinctive characteristics: 1) it for the first time leverages convolutional recurrent neural network for simultaneously modeling spatial and temporal structure of video for summarization; 2) thorough and delicate features of video are obtained in the proposed architecture by trainable three-dimension convolutional neural networks and feature fusion; and 3) a new loss function named Sobolev loss is defined, aiming to constrain the derivative of sequential data and exploit potential temporal structure of video. A series of experiments are conducted to prove the effectiveness of the proposed method. We further analyze our method from different aspects by well-designed experiments.",

keywords = "CRNN, CRSum, Sobolev loss, spatiotemporal modeling, video summarization",

author = "Yuan Yuan and Haopeng Li and Qi Wang",

note = "Publisher Copyright: {\textcopyright} 2013 IEEE.",

year = "2019",

doi = "10.1109/ACCESS.2019.2916989",

language = "英语",

volume = "7",

pages = "64676--64685",

journal = "IEEE Access",

issn = "2169-3536",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Spatiotemporal modeling for video summarization using convolutional recurrent neural network

AU - Yuan, Yuan

AU - Li, Haopeng

AU - Wang, Qi

PY - 2019

Y1 - 2019

N2 - In this paper, a novel neural network named CRSum for the video summarization task is proposed. The proposed network integrates feature extraction, temporal modeling, and summary generation into an end-to-end architecture. Compared with previous work on this task, the proposed method owns three distinctive characteristics: 1) it for the first time leverages convolutional recurrent neural network for simultaneously modeling spatial and temporal structure of video for summarization; 2) thorough and delicate features of video are obtained in the proposed architecture by trainable three-dimension convolutional neural networks and feature fusion; and 3) a new loss function named Sobolev loss is defined, aiming to constrain the derivative of sequential data and exploit potential temporal structure of video. A series of experiments are conducted to prove the effectiveness of the proposed method. We further analyze our method from different aspects by well-designed experiments.

AB - In this paper, a novel neural network named CRSum for the video summarization task is proposed. The proposed network integrates feature extraction, temporal modeling, and summary generation into an end-to-end architecture. Compared with previous work on this task, the proposed method owns three distinctive characteristics: 1) it for the first time leverages convolutional recurrent neural network for simultaneously modeling spatial and temporal structure of video for summarization; 2) thorough and delicate features of video are obtained in the proposed architecture by trainable three-dimension convolutional neural networks and feature fusion; and 3) a new loss function named Sobolev loss is defined, aiming to constrain the derivative of sequential data and exploit potential temporal structure of video. A series of experiments are conducted to prove the effectiveness of the proposed method. We further analyze our method from different aspects by well-designed experiments.

KW - CRNN

KW - CRSum

KW - Sobolev loss

KW - spatiotemporal modeling

KW - video summarization

UR - http://www.scopus.com/inward/record.url?scp=85066615795&partnerID=8YFLogxK

U2 - 10.1109/ACCESS.2019.2916989

DO - 10.1109/ACCESS.2019.2916989

M3 - 文章

AN - SCOPUS:85066615795

SN - 2169-3536

VL - 7

SP - 64676

EP - 64685

JO - IEEE Access

JF - IEEE Access

M1 - 8715406

ER -

Spatiotemporal modeling for video summarization using convolutional recurrent neural network

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this