Video summarization with a dual-path attentive network

Guoqiang Liang; Yanbing Lv; Shucheng Li; Xiahong Wang; Yanning Zhang

doi:10.1016/j.neucom.2021.09.015

Video summarization with a dual-path attentive network

Guoqiang Liang, Yanbing Lv, Shucheng Li, Xiahong Wang, Yanning Zhang

School of Computer Science

Northwestern Polytechnical University Xian

Research output: Contribution to journal › Article › peer-review

21 Scopus citations

Abstract

With the explosive growth of videos captured everyday, how to efficiently extract useful information from videos has become a more and more important problem. As one of the most effective methods, video summarization aiming to extract the most important frames or shots has attracted more interests recently. Currently, lots of methods employ a recurrent structure. However, due to its step-by-step characteristic, it is difficult to parallelize these models. To address this problem, we propose a dual-path attentive video summarization framework consisting of a temporal spatial encoder, a score-aware encoder and a decoder. And all of them are mainly based on multi-head self-attention and convolutional block attention module. The temporal spatial encoder is to capture the temporal and spatial information while the score-aware encoder incorporates the appearance features with previously predicted frame-level importance scores. By combining the scores and appearance features, our model can better capture the long-range global dependencies and update the importance scores of previous frames continuously. Moreover, entirely based on attention mechanism, our model can be trained in full parallel, which leads to less training time. To validate the method, we employ the two popular datasets SumMe and TVSum. The experimental results show the effectiveness of the proposed method.

Original language	English
Pages (from-to)	1-9
Number of pages	9
Journal	Neurocomputing
Volume	467
DOIs	https://doi.org/10.1016/j.neucom.2021.09.015
State	Published - 7 Jan 2022

Keywords

Attention mechanism
Encoder-decoder
Video summarization

Access to Document

10.1016/j.neucom.2021.09.015

Cite this

@article{8274a2b84d404efb80e20daf516d442a,

title = "Video summarization with a dual-path attentive network",

abstract = "With the explosive growth of videos captured everyday, how to efficiently extract useful information from videos has become a more and more important problem. As one of the most effective methods, video summarization aiming to extract the most important frames or shots has attracted more interests recently. Currently, lots of methods employ a recurrent structure. However, due to its step-by-step characteristic, it is difficult to parallelize these models. To address this problem, we propose a dual-path attentive video summarization framework consisting of a temporal spatial encoder, a score-aware encoder and a decoder. And all of them are mainly based on multi-head self-attention and convolutional block attention module. The temporal spatial encoder is to capture the temporal and spatial information while the score-aware encoder incorporates the appearance features with previously predicted frame-level importance scores. By combining the scores and appearance features, our model can better capture the long-range global dependencies and update the importance scores of previous frames continuously. Moreover, entirely based on attention mechanism, our model can be trained in full parallel, which leads to less training time. To validate the method, we employ the two popular datasets SumMe and TVSum. The experimental results show the effectiveness of the proposed method.",

keywords = "Attention mechanism, Encoder-decoder, Video summarization",

author = "Guoqiang Liang and Yanbing Lv and Shucheng Li and Xiahong Wang and Yanning Zhang",

note = "Publisher Copyright: {\textcopyright} 2021 Elsevier B.V.",

year = "2022",

month = jan,

day = "7",

doi = "10.1016/j.neucom.2021.09.015",

language = "英语",

volume = "467",

pages = "1--9",

journal = "Neurocomputing",

issn = "0925-2312",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Video summarization with a dual-path attentive network

AU - Liang, Guoqiang

AU - Lv, Yanbing

AU - Li, Shucheng

AU - Wang, Xiahong

AU - Zhang, Yanning

PY - 2022/1/7

Y1 - 2022/1/7

N2 - With the explosive growth of videos captured everyday, how to efficiently extract useful information from videos has become a more and more important problem. As one of the most effective methods, video summarization aiming to extract the most important frames or shots has attracted more interests recently. Currently, lots of methods employ a recurrent structure. However, due to its step-by-step characteristic, it is difficult to parallelize these models. To address this problem, we propose a dual-path attentive video summarization framework consisting of a temporal spatial encoder, a score-aware encoder and a decoder. And all of them are mainly based on multi-head self-attention and convolutional block attention module. The temporal spatial encoder is to capture the temporal and spatial information while the score-aware encoder incorporates the appearance features with previously predicted frame-level importance scores. By combining the scores and appearance features, our model can better capture the long-range global dependencies and update the importance scores of previous frames continuously. Moreover, entirely based on attention mechanism, our model can be trained in full parallel, which leads to less training time. To validate the method, we employ the two popular datasets SumMe and TVSum. The experimental results show the effectiveness of the proposed method.

AB - With the explosive growth of videos captured everyday, how to efficiently extract useful information from videos has become a more and more important problem. As one of the most effective methods, video summarization aiming to extract the most important frames or shots has attracted more interests recently. Currently, lots of methods employ a recurrent structure. However, due to its step-by-step characteristic, it is difficult to parallelize these models. To address this problem, we propose a dual-path attentive video summarization framework consisting of a temporal spatial encoder, a score-aware encoder and a decoder. And all of them are mainly based on multi-head self-attention and convolutional block attention module. The temporal spatial encoder is to capture the temporal and spatial information while the score-aware encoder incorporates the appearance features with previously predicted frame-level importance scores. By combining the scores and appearance features, our model can better capture the long-range global dependencies and update the importance scores of previous frames continuously. Moreover, entirely based on attention mechanism, our model can be trained in full parallel, which leads to less training time. To validate the method, we employ the two popular datasets SumMe and TVSum. The experimental results show the effectiveness of the proposed method.

KW - Attention mechanism

KW - Encoder-decoder

KW - Video summarization

UR - http://www.scopus.com/inward/record.url?scp=85116580167&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2021.09.015

DO - 10.1016/j.neucom.2021.09.015

M3 - 文章

AN - SCOPUS:85116580167

SN - 0925-2312

VL - 467

SP - 1

EP - 9

JO - Neurocomputing

JF - Neurocomputing

ER -

Video summarization with a dual-path attentive network

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this