TY - JOUR
T1 - A General Framework for Edited Video and Raw Video Summarization
AU - Li, Xuelong
AU - Zhao, Bin
AU - Lu, Xiaoqiang
N1 - Publisher Copyright:
© 1992-2012 IEEE.
PY - 2017/8
Y1 - 2017/8
N2 - In this paper, we build a general summarization framework for both of edited video and raw video summarization. Overall, our work can be divided into three folds. 1) Four models are designed to capture the properties of video summaries, i.e., containing important people and objects (importance), representative to the video content (representativeness), no similar key-shots (diversity), and smoothness of the storyline (storyness). Specifically, these models are applicable to both edited videos and raw videos. 2) A comprehensive score function is built with the weighted combination of the aforementioned four models. Note that the weights of the four models in the score function, denoted as property-weight, are learned in a supervised manner. Besides, the property-weights are learned for edited videos and raw videos, respectively. 3) The training set is constructed with both edited videos and raw videos in order to make up the lack of training data. Particularly, each training video is equipped with a pair of mixing-coefficients, which can reduce the structure mess in the training set caused by the rough mixture. We test our framework on three data sets, including edited videos, short raw videos, and long raw videos. Experimental results have verified the effectiveness of the proposed framework.
AB - In this paper, we build a general summarization framework for both of edited video and raw video summarization. Overall, our work can be divided into three folds. 1) Four models are designed to capture the properties of video summaries, i.e., containing important people and objects (importance), representative to the video content (representativeness), no similar key-shots (diversity), and smoothness of the storyline (storyness). Specifically, these models are applicable to both edited videos and raw videos. 2) A comprehensive score function is built with the weighted combination of the aforementioned four models. Note that the weights of the four models in the score function, denoted as property-weight, are learned in a supervised manner. Besides, the property-weights are learned for edited videos and raw videos, respectively. 3) The training set is constructed with both edited videos and raw videos in order to make up the lack of training data. Particularly, each training video is equipped with a pair of mixing-coefficients, which can reduce the structure mess in the training set caused by the rough mixture. We test our framework on three data sets, including edited videos, short raw videos, and long raw videos. Experimental results have verified the effectiveness of the proposed framework.
KW - Video summary
KW - mixing-coefficient
KW - property-weight
KW - score function
UR - http://www.scopus.com/inward/record.url?scp=85020694303&partnerID=8YFLogxK
U2 - 10.1109/TIP.2017.2695887
DO - 10.1109/TIP.2017.2695887
M3 - 文章
C2 - 28436870
AN - SCOPUS:85020694303
SN - 1057-7149
VL - 26
SP - 3652
EP - 3664
JO - IEEE Transactions on Image Processing
JF - IEEE Transactions on Image Processing
IS - 8
M1 - 7904630
ER -