Multi-scale deep feature fusion based sparse dictionary selection for video summarization

Xiao Wu, Mingyang Ma, Shuai Wan, Xiuxiu Han, Shaohui Mei

科研成果: 期刊稿件文章同行评审

2 引用 (Scopus)

摘要

The explosive growth of video data constitutes a series of new challenges in computer vision, and the function of video summarization (VS) is becoming more and more prominent. Recent works have shown the effectiveness of sparse dictionary selection (SDS) based VS, which selects a representative frame set to sufficiently reconstruct a given video. Existing SDS based VS methods use conventional handcrafted features or single-scale deep features, which could diminish their summarization performance due to the underutilization of frame feature representation. Deep learning techniques based on convolutional neural networks (CNNs) exhibit powerful capabilities among various vision tasks, as the CNN provides excellent feature representation. Therefore, in this paper, a multi-scale deep feature fusion based sparse dictionary selection (MSDFF-SDS) is proposed for VS. Specifically, multi-scale features include the directly extracted features from the last fully connected layer and the global average pooling (GAP) processed features from intermediate layers, then VS is formulated as a problem of minimizing the reconstruction error using the multi-scale deep feature fusion. In our formulation, the contribution of each scale of features can be adjusted by a balance parameter, and the row-sparsity consistency of the simultaneous reconstruction coefficient is used to select as few keyframes as possible. The resulting MSDFF-SDS model is solved by using an efficient greedy pursuit algorithm. Experimental results on two benchmark datasets demonstrate that the proposed MSDFF-SDS improves the F-score of keyframe based summarization more than 3% compared with the existing SDS methods, and performs better than most deep-learning methods for skimming based summarization.

源语言英语
文章编号117006
期刊Signal Processing: Image Communication
118
DOI
出版状态已出版 - 10月 2023

指纹

探究 'Multi-scale deep feature fusion based sparse dictionary selection for video summarization' 的科研主题。它们共同构成独一无二的指纹。

引用此