A General Framework for Edited Video and Raw Video Summarization

Xuelong Li; Bin Zhao; Xiaoqiang Lu

doi:10.1109/TIP.2017.2695887

A General Framework for Edited Video and Raw Video Summarization

Xuelong Li, Bin Zhao, Xiaoqiang Lu

光电与智能研究院

CAS - Xi'an Institute of Optics and Precision Mechanics

科研成果: 期刊稿件 › 文章 › 同行评审

103 引用（Scopus）

摘要

In this paper, we build a general summarization framework for both of edited video and raw video summarization. Overall, our work can be divided into three folds. 1) Four models are designed to capture the properties of video summaries, i.e., containing important people and objects (importance), representative to the video content (representativeness), no similar key-shots (diversity), and smoothness of the storyline (storyness). Specifically, these models are applicable to both edited videos and raw videos. 2) A comprehensive score function is built with the weighted combination of the aforementioned four models. Note that the weights of the four models in the score function, denoted as property-weight, are learned in a supervised manner. Besides, the property-weights are learned for edited videos and raw videos, respectively. 3) The training set is constructed with both edited videos and raw videos in order to make up the lack of training data. Particularly, each training video is equipped with a pair of mixing-coefficients, which can reduce the structure mess in the training set caused by the rough mixture. We test our framework on three data sets, including edited videos, short raw videos, and long raw videos. Experimental results have verified the effectiveness of the proposed framework.

源语言	英语
文章编号	7904630
页（从-至）	3652-3664
页数	13
期刊	IEEE Transactions on Image Processing
卷	26
期	8
DOI	https://doi.org/10.1109/TIP.2017.2695887
出版状态	已出版 - 8月 2017

访问文件

10.1109/TIP.2017.2695887

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{fd25d07a7eda4c3ca6663ed574b3c0fc,

title = "A General Framework for Edited Video and Raw Video Summarization",

abstract = "In this paper, we build a general summarization framework for both of edited video and raw video summarization. Overall, our work can be divided into three folds. 1) Four models are designed to capture the properties of video summaries, i.e., containing important people and objects (importance), representative to the video content (representativeness), no similar key-shots (diversity), and smoothness of the storyline (storyness). Specifically, these models are applicable to both edited videos and raw videos. 2) A comprehensive score function is built with the weighted combination of the aforementioned four models. Note that the weights of the four models in the score function, denoted as property-weight, are learned in a supervised manner. Besides, the property-weights are learned for edited videos and raw videos, respectively. 3) The training set is constructed with both edited videos and raw videos in order to make up the lack of training data. Particularly, each training video is equipped with a pair of mixing-coefficients, which can reduce the structure mess in the training set caused by the rough mixture. We test our framework on three data sets, including edited videos, short raw videos, and long raw videos. Experimental results have verified the effectiveness of the proposed framework.",

keywords = "Video summary, mixing-coefficient, property-weight, score function",

author = "Xuelong Li and Bin Zhao and Xiaoqiang Lu",

note = "Publisher Copyright: {\textcopyright} 1992-2012 IEEE.",

year = "2017",

month = aug,

doi = "10.1109/TIP.2017.2695887",

language = "英语",

volume = "26",

pages = "3652--3664",

journal = "IEEE Transactions on Image Processing",

issn = "1057-7149",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "8",

}

TY - JOUR

T1 - A General Framework for Edited Video and Raw Video Summarization

AU - Li, Xuelong

AU - Zhao, Bin

AU - Lu, Xiaoqiang

PY - 2017/8

Y1 - 2017/8

N2 - In this paper, we build a general summarization framework for both of edited video and raw video summarization. Overall, our work can be divided into three folds. 1) Four models are designed to capture the properties of video summaries, i.e., containing important people and objects (importance), representative to the video content (representativeness), no similar key-shots (diversity), and smoothness of the storyline (storyness). Specifically, these models are applicable to both edited videos and raw videos. 2) A comprehensive score function is built with the weighted combination of the aforementioned four models. Note that the weights of the four models in the score function, denoted as property-weight, are learned in a supervised manner. Besides, the property-weights are learned for edited videos and raw videos, respectively. 3) The training set is constructed with both edited videos and raw videos in order to make up the lack of training data. Particularly, each training video is equipped with a pair of mixing-coefficients, which can reduce the structure mess in the training set caused by the rough mixture. We test our framework on three data sets, including edited videos, short raw videos, and long raw videos. Experimental results have verified the effectiveness of the proposed framework.

AB - In this paper, we build a general summarization framework for both of edited video and raw video summarization. Overall, our work can be divided into three folds. 1) Four models are designed to capture the properties of video summaries, i.e., containing important people and objects (importance), representative to the video content (representativeness), no similar key-shots (diversity), and smoothness of the storyline (storyness). Specifically, these models are applicable to both edited videos and raw videos. 2) A comprehensive score function is built with the weighted combination of the aforementioned four models. Note that the weights of the four models in the score function, denoted as property-weight, are learned in a supervised manner. Besides, the property-weights are learned for edited videos and raw videos, respectively. 3) The training set is constructed with both edited videos and raw videos in order to make up the lack of training data. Particularly, each training video is equipped with a pair of mixing-coefficients, which can reduce the structure mess in the training set caused by the rough mixture. We test our framework on three data sets, including edited videos, short raw videos, and long raw videos. Experimental results have verified the effectiveness of the proposed framework.

KW - Video summary

KW - mixing-coefficient

KW - property-weight

KW - score function

UR - http://www.scopus.com/inward/record.url?scp=85020694303&partnerID=8YFLogxK

U2 - 10.1109/TIP.2017.2695887

DO - 10.1109/TIP.2017.2695887

M3 - 文章

C2 - 28436870

AN - SCOPUS:85020694303

SN - 1057-7149

VL - 26

SP - 3652

EP - 3664

JO - IEEE Transactions on Image Processing

JF - IEEE Transactions on Image Processing

IS - 8

M1 - 7904630

ER -

A General Framework for Edited Video and Raw Video Summarization

摘要

访问文件

其它文件与链接

指纹

引用此