TY - GEN
T1 - Category driven deep recurrent neural network for video summarization
AU - Xinhui, Song
AU - Chen, Ke
AU - Lei, Jie
AU - Sun, Li
AU - Wang, Zhiyuan
AU - Xie, Lei
AU - Mingli, Song
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/9/22
Y1 - 2016/9/22
N2 - A large number of videos are generated and uploaded to video websites (like youku, youtube) every day and video websites play more and more important roles in human life. While bringing convenience, the big video data raise the difficulty of video summarization to allow users to browse a video easily. However, although there are many existing video summarization approaches, the key frames selected fail to integrate the large video contexts and the qualities of the summarized results are difficult to evaluate because of the lack of ground-truth. Inspired by the previous methods that extract key frames, we propose a deep recurrent neural network model, which learns to extract category-driven key frames. First, we sequentially extract a fixed number of key frames using time-dependent location networks. Second, we utilize recurrent neural network to integrate information of the key frames to classify the category of the video. Therefore, the quality of the extracted key frames could be evaluated by the categorization accuracy. Experiments on a 500-video dataset show that the proposed scheme extracts reasonable key frames and outperforms other methods by quantitative evaluation.
AB - A large number of videos are generated and uploaded to video websites (like youku, youtube) every day and video websites play more and more important roles in human life. While bringing convenience, the big video data raise the difficulty of video summarization to allow users to browse a video easily. However, although there are many existing video summarization approaches, the key frames selected fail to integrate the large video contexts and the qualities of the summarized results are difficult to evaluate because of the lack of ground-truth. Inspired by the previous methods that extract key frames, we propose a deep recurrent neural network model, which learns to extract category-driven key frames. First, we sequentially extract a fixed number of key frames using time-dependent location networks. Second, we utilize recurrent neural network to integrate information of the key frames to classify the category of the video. Therefore, the quality of the extracted key frames could be evaluated by the categorization accuracy. Experiments on a 500-video dataset show that the proposed scheme extracts reasonable key frames and outperforms other methods by quantitative evaluation.
KW - Recurrent video summarization
KW - Reinforcement learning
KW - Video categorization
UR - http://www.scopus.com/inward/record.url?scp=84992088974&partnerID=8YFLogxK
U2 - 10.1109/ICMEW.2016.7574720
DO - 10.1109/ICMEW.2016.7574720
M3 - 会议稿件
AN - SCOPUS:84992088974
T3 - 2016 IEEE International Conference on Multimedia and Expo Workshop, ICMEW 2016
BT - 2016 IEEE International Conference on Multimedia and Expo Workshop, ICMEW 2016
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2016 IEEE International Conference on Multimedia and Expo Workshop, ICMEW 2016
Y2 - 11 July 2016 through 15 July 2016
ER -