Property-Constrained Dual Learning for Video Summarization

Bin Zhao; Xuelong Li; Xiaoqiang Lu

doi:10.1109/TNNLS.2019.2951680

Property-Constrained Dual Learning for Video Summarization

Bin Zhao, Xuelong Li, Xiaoqiang Lu

School of Artificial Intelligence, OPtics and Electronics

Research output: Contribution to journal › Article › peer-review

62 Scopus citations

Abstract

Video summarization is the technique to condense large-scale videos into summaries composed of key-frames or key-shots so that the viewers can browse the video content efficiently. Recently, supervised approaches have achieved great success by taking advantages of recurrent neural networks (RNNs). Most of them focus on generating summaries by maximizing the overlap between the generated summary and the ground truth. However, they neglect the most critical principle, i.e., whether the viewer can infer the original video content from the summary. As a result, existing approaches cannot preserve the summary quality well and usually demand large amounts of training data to reduce overfitting. In our view, video summarization has two tasks, i.e., generating summaries from videos and inferring the original content from summaries. Motivated by this, we propose a dual learning framework by integrating the summary generation (primal task) and video reconstruction (dual task) together, which targets to reward the summary generator under the assistance of the video reconstructor. Moreover, to provide more guidance to the summary generator, two property models are developed to measure the representativeness and diversity of the generated summary. Practically, experiments on four popular data sets (SumMe, TVsum, OVP, and YouTube) have demonstrated that our approach, with compact RNNs as the summary generator, using less training data, and even in the unsupervised setting, can get comparable performance with those supervised ones adopting more complex summary generators and trained on more annotated data.

Original language	English
Article number	8924889
Pages (from-to)	3989-4000
Number of pages	12
Journal	IEEE Transactions on Neural Networks and Learning Systems
Volume	31
Issue number	10
DOIs	https://doi.org/10.1109/TNNLS.2019.2951680
State	Published - Oct 2020

Keywords

Dual learning
property model
recurrent neural network (RNN)
video summarization

Access to Document

10.1109/TNNLS.2019.2951680

Cite this

@article{0365cdd12b3049ad90826fcbf8658834,

title = "Property-Constrained Dual Learning for Video Summarization",

abstract = "Video summarization is the technique to condense large-scale videos into summaries composed of key-frames or key-shots so that the viewers can browse the video content efficiently. Recently, supervised approaches have achieved great success by taking advantages of recurrent neural networks (RNNs). Most of them focus on generating summaries by maximizing the overlap between the generated summary and the ground truth. However, they neglect the most critical principle, i.e., whether the viewer can infer the original video content from the summary. As a result, existing approaches cannot preserve the summary quality well and usually demand large amounts of training data to reduce overfitting. In our view, video summarization has two tasks, i.e., generating summaries from videos and inferring the original content from summaries. Motivated by this, we propose a dual learning framework by integrating the summary generation (primal task) and video reconstruction (dual task) together, which targets to reward the summary generator under the assistance of the video reconstructor. Moreover, to provide more guidance to the summary generator, two property models are developed to measure the representativeness and diversity of the generated summary. Practically, experiments on four popular data sets (SumMe, TVsum, OVP, and YouTube) have demonstrated that our approach, with compact RNNs as the summary generator, using less training data, and even in the unsupervised setting, can get comparable performance with those supervised ones adopting more complex summary generators and trained on more annotated data.",

keywords = "Dual learning, property model, recurrent neural network (RNN), video summarization",

author = "Bin Zhao and Xuelong Li and Xiaoqiang Lu",

note = "Publisher Copyright: {\textcopyright} 2012 IEEE.",

year = "2020",

month = oct,

doi = "10.1109/TNNLS.2019.2951680",

language = "英语",

volume = "31",

pages = "3989--4000",

journal = "IEEE Transactions on Neural Networks and Learning Systems",

issn = "2162-237X",

publisher = "IEEE Computational Intelligence Society",

number = "10",

}

TY - JOUR

T1 - Property-Constrained Dual Learning for Video Summarization

AU - Zhao, Bin

AU - Li, Xuelong

AU - Lu, Xiaoqiang

PY - 2020/10

Y1 - 2020/10

N2 - Video summarization is the technique to condense large-scale videos into summaries composed of key-frames or key-shots so that the viewers can browse the video content efficiently. Recently, supervised approaches have achieved great success by taking advantages of recurrent neural networks (RNNs). Most of them focus on generating summaries by maximizing the overlap between the generated summary and the ground truth. However, they neglect the most critical principle, i.e., whether the viewer can infer the original video content from the summary. As a result, existing approaches cannot preserve the summary quality well and usually demand large amounts of training data to reduce overfitting. In our view, video summarization has two tasks, i.e., generating summaries from videos and inferring the original content from summaries. Motivated by this, we propose a dual learning framework by integrating the summary generation (primal task) and video reconstruction (dual task) together, which targets to reward the summary generator under the assistance of the video reconstructor. Moreover, to provide more guidance to the summary generator, two property models are developed to measure the representativeness and diversity of the generated summary. Practically, experiments on four popular data sets (SumMe, TVsum, OVP, and YouTube) have demonstrated that our approach, with compact RNNs as the summary generator, using less training data, and even in the unsupervised setting, can get comparable performance with those supervised ones adopting more complex summary generators and trained on more annotated data.

AB - Video summarization is the technique to condense large-scale videos into summaries composed of key-frames or key-shots so that the viewers can browse the video content efficiently. Recently, supervised approaches have achieved great success by taking advantages of recurrent neural networks (RNNs). Most of them focus on generating summaries by maximizing the overlap between the generated summary and the ground truth. However, they neglect the most critical principle, i.e., whether the viewer can infer the original video content from the summary. As a result, existing approaches cannot preserve the summary quality well and usually demand large amounts of training data to reduce overfitting. In our view, video summarization has two tasks, i.e., generating summaries from videos and inferring the original content from summaries. Motivated by this, we propose a dual learning framework by integrating the summary generation (primal task) and video reconstruction (dual task) together, which targets to reward the summary generator under the assistance of the video reconstructor. Moreover, to provide more guidance to the summary generator, two property models are developed to measure the representativeness and diversity of the generated summary. Practically, experiments on four popular data sets (SumMe, TVsum, OVP, and YouTube) have demonstrated that our approach, with compact RNNs as the summary generator, using less training data, and even in the unsupervised setting, can get comparable performance with those supervised ones adopting more complex summary generators and trained on more annotated data.

KW - Dual learning

KW - property model

KW - recurrent neural network (RNN)

KW - video summarization

UR - http://www.scopus.com/inward/record.url?scp=85092680126&partnerID=8YFLogxK

U2 - 10.1109/TNNLS.2019.2951680

DO - 10.1109/TNNLS.2019.2951680

M3 - 文章

C2 - 31825876

AN - SCOPUS:85092680126

SN - 2162-237X

VL - 31

SP - 3989

EP - 4000

JO - IEEE Transactions on Neural Networks and Learning Systems

JF - IEEE Transactions on Neural Networks and Learning Systems

IS - 10

M1 - 8924889

ER -

Property-Constrained Dual Learning for Video Summarization

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this