Video summarization with a dual-path attentive network

Guoqiang Liang, Yanbing Lv, Shucheng Li, Xiahong Wang, Yanning Zhang

科研成果: 期刊稿件文章同行评审

21 引用 (Scopus)

摘要

With the explosive growth of videos captured everyday, how to efficiently extract useful information from videos has become a more and more important problem. As one of the most effective methods, video summarization aiming to extract the most important frames or shots has attracted more interests recently. Currently, lots of methods employ a recurrent structure. However, due to its step-by-step characteristic, it is difficult to parallelize these models. To address this problem, we propose a dual-path attentive video summarization framework consisting of a temporal spatial encoder, a score-aware encoder and a decoder. And all of them are mainly based on multi-head self-attention and convolutional block attention module. The temporal spatial encoder is to capture the temporal and spatial information while the score-aware encoder incorporates the appearance features with previously predicted frame-level importance scores. By combining the scores and appearance features, our model can better capture the long-range global dependencies and update the importance scores of previous frames continuously. Moreover, entirely based on attention mechanism, our model can be trained in full parallel, which leads to less training time. To validate the method, we employ the two popular datasets SumMe and TVSum. The experimental results show the effectiveness of the proposed method.

源语言英语
页(从-至)1-9
页数9
期刊Neurocomputing
467
DOI
出版状态已出版 - 7 1月 2022

指纹

探究 'Video summarization with a dual-path attentive network' 的科研主题。它们共同构成独一无二的指纹。

引用此