Graph Convolutional Dictionary Selection With L2,p Norm for Video Summarization

Mingyang Ma; Shaohui Mei; Shuai Wan; Zhiyong Wang; Xian Sheng Hua; David Dagan Feng

doi:10.1109/TIP.2022.3146012

Graph Convolutional Dictionary Selection With L₂,p Norm for Video Summarization

Mingyang Ma, Shaohui Mei, Shuai Wan, Zhiyong Wang, Xian Sheng Hua, David Dagan Feng

电子信息学院

科研成果: 期刊稿件 › 文章 › 同行评审

16 引用（Scopus）

摘要

Video Summarization (VS) has become one of the most effective solutions for quickly understanding a large volume of video data. Dictionary selection with self representation and sparse regularization has demonstrated its promise for VS by formulating the VS problem as a sparse selection task on video frames. However, existing dictionary selection models are generally designed only for data reconstruction, which results in the neglect of the inherent structured information among video frames. In addition, the sparsity commonly constrained by L_{2,1} norm is not strong enough, which causes the redundancy of keyframes, i.e., similar keyframes are selected. Therefore, to address these two issues, in this paper we propose a general framework called graph convolutional dictionary selection with L_{2,p} ( 0< p\leq 1 ) norm (GCDS _{2,p} ) for both keyframe selection and skimming based summarization. Firstly, we incorporate graph embedding into dictionary selection to generate the graph embedding dictionary, which can take the structured information depicted in videos into account. Secondly, we propose to use L_{2,p} ( 0< p\leq 1 ) norm constrained row sparsity, in which p can be flexibly set for two forms of video summarization. For keyframe selection, 0< p< 1 can be utilized to select diverse and representative keyframes; and for skimming, p=1 can be utilized to select key shots. In addition, an efficient iterative algorithm is devised to optimize the proposed model, and the convergence is theoretically proved. Experimental results including both keyframe selection and skimming based summarization on four benchmark datasets demonstrate the effectiveness and superiority of the proposed method.

源语言	英语
页（从-至）	1789-1804
页数	16
期刊	IEEE Transactions on Image Processing
卷	31
DOI	https://doi.org/10.1109/TIP.2022.3146012
出版状态	已出版 - 2022

访问文件

10.1109/TIP.2022.3146012

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{6551d76d199e4aef96a703abd998e9aa,

title = "Graph Convolutional Dictionary Selection With L2,p Norm for Video Summarization",

abstract = "Video Summarization (VS) has become one of the most effective solutions for quickly understanding a large volume of video data. Dictionary selection with self representation and sparse regularization has demonstrated its promise for VS by formulating the VS problem as a sparse selection task on video frames. However, existing dictionary selection models are generally designed only for data reconstruction, which results in the neglect of the inherent structured information among video frames. In addition, the sparsity commonly constrained by L_{2,1} norm is not strong enough, which causes the redundancy of keyframes, i.e., similar keyframes are selected. Therefore, to address these two issues, in this paper we propose a general framework called graph convolutional dictionary selection with L_{2,p} ( 0< p\leq 1 ) norm (GCDS _{2,p} ) for both keyframe selection and skimming based summarization. Firstly, we incorporate graph embedding into dictionary selection to generate the graph embedding dictionary, which can take the structured information depicted in videos into account. Secondly, we propose to use L_{2,p} ( 0< p\leq 1 ) norm constrained row sparsity, in which p can be flexibly set for two forms of video summarization. For keyframe selection, 0< p< 1 can be utilized to select diverse and representative keyframes; and for skimming, p=1 can be utilized to select key shots. In addition, an efficient iterative algorithm is devised to optimize the proposed model, and the convergence is theoretically proved. Experimental results including both keyframe selection and skimming based summarization on four benchmark datasets demonstrate the effectiveness and superiority of the proposed method.",

keywords = "dictionary selection, graph embedding, L,p norm, Video summarization",

author = "Mingyang Ma and Shaohui Mei and Shuai Wan and Zhiyong Wang and Hua, {Xian Sheng} and Feng, {David Dagan}",

note = "Publisher Copyright: {\textcopyright} 1992-2012 IEEE.",

year = "2022",

doi = "10.1109/TIP.2022.3146012",

language = "英语",

volume = "31",

pages = "1789--1804",

journal = "IEEE Transactions on Image Processing",

issn = "1057-7149",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Graph Convolutional Dictionary Selection With L2,p Norm for Video Summarization

AU - Ma, Mingyang

AU - Mei, Shaohui

AU - Wan, Shuai

AU - Wang, Zhiyong

AU - Hua, Xian Sheng

AU - Feng, David Dagan

PY - 2022

Y1 - 2022

N2 - Video Summarization (VS) has become one of the most effective solutions for quickly understanding a large volume of video data. Dictionary selection with self representation and sparse regularization has demonstrated its promise for VS by formulating the VS problem as a sparse selection task on video frames. However, existing dictionary selection models are generally designed only for data reconstruction, which results in the neglect of the inherent structured information among video frames. In addition, the sparsity commonly constrained by L_{2,1} norm is not strong enough, which causes the redundancy of keyframes, i.e., similar keyframes are selected. Therefore, to address these two issues, in this paper we propose a general framework called graph convolutional dictionary selection with L_{2,p} ( 0< p\leq 1 ) norm (GCDS _{2,p} ) for both keyframe selection and skimming based summarization. Firstly, we incorporate graph embedding into dictionary selection to generate the graph embedding dictionary, which can take the structured information depicted in videos into account. Secondly, we propose to use L_{2,p} ( 0< p\leq 1 ) norm constrained row sparsity, in which p can be flexibly set for two forms of video summarization. For keyframe selection, 0< p< 1 can be utilized to select diverse and representative keyframes; and for skimming, p=1 can be utilized to select key shots. In addition, an efficient iterative algorithm is devised to optimize the proposed model, and the convergence is theoretically proved. Experimental results including both keyframe selection and skimming based summarization on four benchmark datasets demonstrate the effectiveness and superiority of the proposed method.

AB - Video Summarization (VS) has become one of the most effective solutions for quickly understanding a large volume of video data. Dictionary selection with self representation and sparse regularization has demonstrated its promise for VS by formulating the VS problem as a sparse selection task on video frames. However, existing dictionary selection models are generally designed only for data reconstruction, which results in the neglect of the inherent structured information among video frames. In addition, the sparsity commonly constrained by L_{2,1} norm is not strong enough, which causes the redundancy of keyframes, i.e., similar keyframes are selected. Therefore, to address these two issues, in this paper we propose a general framework called graph convolutional dictionary selection with L_{2,p} ( 0< p\leq 1 ) norm (GCDS _{2,p} ) for both keyframe selection and skimming based summarization. Firstly, we incorporate graph embedding into dictionary selection to generate the graph embedding dictionary, which can take the structured information depicted in videos into account. Secondly, we propose to use L_{2,p} ( 0< p\leq 1 ) norm constrained row sparsity, in which p can be flexibly set for two forms of video summarization. For keyframe selection, 0< p< 1 can be utilized to select diverse and representative keyframes; and for skimming, p=1 can be utilized to select key shots. In addition, an efficient iterative algorithm is devised to optimize the proposed model, and the convergence is theoretically proved. Experimental results including both keyframe selection and skimming based summarization on four benchmark datasets demonstrate the effectiveness and superiority of the proposed method.

KW - dictionary selection

KW - graph embedding

KW - L,p norm

KW - Video summarization

UR - http://www.scopus.com/inward/record.url?scp=85124231483&partnerID=8YFLogxK

U2 - 10.1109/TIP.2022.3146012

DO - 10.1109/TIP.2022.3146012

M3 - 文章

C2 - 35100116

AN - SCOPUS:85124231483

SN - 1057-7149

VL - 31

SP - 1789

EP - 1804

JO - IEEE Transactions on Image Processing

JF - IEEE Transactions on Image Processing

ER -

Graph Convolutional Dictionary Selection With L2,p Norm for Video Summarization

摘要

访问文件

其它文件与链接

指纹

引用此

Graph Convolutional Dictionary Selection With L₂,p Norm for Video Summarization