TY - JOUR
T1 - Graph Convolutional Dictionary Selection With L2,p Norm for Video Summarization
AU - Ma, Mingyang
AU - Mei, Shaohui
AU - Wan, Shuai
AU - Wang, Zhiyong
AU - Hua, Xian Sheng
AU - Feng, David Dagan
N1 - Publisher Copyright:
© 1992-2012 IEEE.
PY - 2022
Y1 - 2022
N2 - Video Summarization (VS) has become one of the most effective solutions for quickly understanding a large volume of video data. Dictionary selection with self representation and sparse regularization has demonstrated its promise for VS by formulating the VS problem as a sparse selection task on video frames. However, existing dictionary selection models are generally designed only for data reconstruction, which results in the neglect of the inherent structured information among video frames. In addition, the sparsity commonly constrained by L_{2,1} norm is not strong enough, which causes the redundancy of keyframes, i.e., similar keyframes are selected. Therefore, to address these two issues, in this paper we propose a general framework called graph convolutional dictionary selection with L_{2,p} ( 0< p\leq 1 ) norm (GCDS _{2,p} ) for both keyframe selection and skimming based summarization. Firstly, we incorporate graph embedding into dictionary selection to generate the graph embedding dictionary, which can take the structured information depicted in videos into account. Secondly, we propose to use L_{2,p} ( 0< p\leq 1 ) norm constrained row sparsity, in which p can be flexibly set for two forms of video summarization. For keyframe selection, 0< p< 1 can be utilized to select diverse and representative keyframes; and for skimming, p=1 can be utilized to select key shots. In addition, an efficient iterative algorithm is devised to optimize the proposed model, and the convergence is theoretically proved. Experimental results including both keyframe selection and skimming based summarization on four benchmark datasets demonstrate the effectiveness and superiority of the proposed method.
AB - Video Summarization (VS) has become one of the most effective solutions for quickly understanding a large volume of video data. Dictionary selection with self representation and sparse regularization has demonstrated its promise for VS by formulating the VS problem as a sparse selection task on video frames. However, existing dictionary selection models are generally designed only for data reconstruction, which results in the neglect of the inherent structured information among video frames. In addition, the sparsity commonly constrained by L_{2,1} norm is not strong enough, which causes the redundancy of keyframes, i.e., similar keyframes are selected. Therefore, to address these two issues, in this paper we propose a general framework called graph convolutional dictionary selection with L_{2,p} ( 0< p\leq 1 ) norm (GCDS _{2,p} ) for both keyframe selection and skimming based summarization. Firstly, we incorporate graph embedding into dictionary selection to generate the graph embedding dictionary, which can take the structured information depicted in videos into account. Secondly, we propose to use L_{2,p} ( 0< p\leq 1 ) norm constrained row sparsity, in which p can be flexibly set for two forms of video summarization. For keyframe selection, 0< p< 1 can be utilized to select diverse and representative keyframes; and for skimming, p=1 can be utilized to select key shots. In addition, an efficient iterative algorithm is devised to optimize the proposed model, and the convergence is theoretically proved. Experimental results including both keyframe selection and skimming based summarization on four benchmark datasets demonstrate the effectiveness and superiority of the proposed method.
KW - dictionary selection
KW - graph embedding
KW - L,p norm
KW - Video summarization
UR - http://www.scopus.com/inward/record.url?scp=85124231483&partnerID=8YFLogxK
U2 - 10.1109/TIP.2022.3146012
DO - 10.1109/TIP.2022.3146012
M3 - 文章
C2 - 35100116
AN - SCOPUS:85124231483
SN - 1057-7149
VL - 31
SP - 1789
EP - 1804
JO - IEEE Transactions on Image Processing
JF - IEEE Transactions on Image Processing
ER -