Similarity Based Block Sparse Subset Selection for Video Summarization

Mingyang Ma; Shaohui Mei; Shuai Wan; Zhiyong Wang; David Dagan Feng; Mohammed Bennamoun

doi:10.1109/TCSVT.2020.3044600

Similarity Based Block Sparse Subset Selection for Video Summarization

Mingyang Ma, Shaohui Mei, Shuai Wan, Zhiyong Wang, David Dagan Feng, Mohammed Bennamoun

电子信息学院

科研成果: 期刊稿件 › 文章 › 同行评审

20 引用（Scopus）

摘要

Video summarization (VS) is generally formulated as a subset selection problem where a set of representative keyframes or key segments is selected from an entire video frame set. Though many sparse subset selection based VS algorithms have been proposed in the past decade, most of them adopt linear sparse formulation in the explicit feature vector space of video frames, and don't consider the local or global relationships among frames. In this paper, we first extend the conventional sparse subset selection for VS into kernel block sparse subset selection (KBS3) to utilize the advantage of kernel sparse coding and introduce a local inter-frame relationship through packing of frame blocks. Going a step further, we propose a similarity based block sparse subset selection (SB2S3) model by applying a specially designed transformation matrix on the KBS3 model in order to introduce a kind of global inter-frame relationship through the similarity. Finally, a greedy pursuit based algorithm is devised for the proposed NP-hard model optimization. The proposed SB2S3 has the following advantages: 1) through the similarity between each frame and any other frame, the global relationship among all frames can be considered; 2) through block sparse coding, the local relationship of adjacent frames is further considered; and 3) it has a wider application, since features can derive similarity, but not vice versa. It is believed that the effect of modeling such global and local relationships among frames in this paper, is similar to that of modeling the long-range and short-range dependencies among frames in deep learning based methods. Experimental results on three benchmark datasets have demonstrated that the proposed approach is superior to not only other sparse subset selection based VS methods but also most unsupervised deep-learning based VS methods.

源语言	英语
页（从-至）	3967-3980
页数	14
期刊	IEEE Transactions on Circuits and Systems for Video Technology
卷	31
期	10
DOI	https://doi.org/10.1109/TCSVT.2020.3044600
出版状态	已出版 - 1 10月 2021

访问文件

10.1109/TCSVT.2020.3044600

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{3afa453e6ab649fca09f98e780e97e63,

title = "Similarity Based Block Sparse Subset Selection for Video Summarization",

abstract = "Video summarization (VS) is generally formulated as a subset selection problem where a set of representative keyframes or key segments is selected from an entire video frame set. Though many sparse subset selection based VS algorithms have been proposed in the past decade, most of them adopt linear sparse formulation in the explicit feature vector space of video frames, and don't consider the local or global relationships among frames. In this paper, we first extend the conventional sparse subset selection for VS into kernel block sparse subset selection (KBS3) to utilize the advantage of kernel sparse coding and introduce a local inter-frame relationship through packing of frame blocks. Going a step further, we propose a similarity based block sparse subset selection (SB2S3) model by applying a specially designed transformation matrix on the KBS3 model in order to introduce a kind of global inter-frame relationship through the similarity. Finally, a greedy pursuit based algorithm is devised for the proposed NP-hard model optimization. The proposed SB2S3 has the following advantages: 1) through the similarity between each frame and any other frame, the global relationship among all frames can be considered; 2) through block sparse coding, the local relationship of adjacent frames is further considered; and 3) it has a wider application, since features can derive similarity, but not vice versa. It is believed that the effect of modeling such global and local relationships among frames in this paper, is similar to that of modeling the long-range and short-range dependencies among frames in deep learning based methods. Experimental results on three benchmark datasets have demonstrated that the proposed approach is superior to not only other sparse subset selection based VS methods but also most unsupervised deep-learning based VS methods.",

keywords = "block sparsity, Kernel sparse representation, similarity, Video summarization",

author = "Mingyang Ma and Shaohui Mei and Shuai Wan and Zhiyong Wang and Feng, {David Dagan} and Mohammed Bennamoun",

note = "Publisher Copyright: {\textcopyright} 1991-2012 IEEE.",

year = "2021",

month = oct,

day = "1",

doi = "10.1109/TCSVT.2020.3044600",

language = "英语",

volume = "31",

pages = "3967--3980",

journal = "IEEE Transactions on Circuits and Systems for Video Technology",

issn = "1051-8215",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "10",

}

TY - JOUR

T1 - Similarity Based Block Sparse Subset Selection for Video Summarization

AU - Ma, Mingyang

AU - Mei, Shaohui

AU - Wan, Shuai

AU - Wang, Zhiyong

AU - Feng, David Dagan

AU - Bennamoun, Mohammed

PY - 2021/10/1

Y1 - 2021/10/1

N2 - Video summarization (VS) is generally formulated as a subset selection problem where a set of representative keyframes or key segments is selected from an entire video frame set. Though many sparse subset selection based VS algorithms have been proposed in the past decade, most of them adopt linear sparse formulation in the explicit feature vector space of video frames, and don't consider the local or global relationships among frames. In this paper, we first extend the conventional sparse subset selection for VS into kernel block sparse subset selection (KBS3) to utilize the advantage of kernel sparse coding and introduce a local inter-frame relationship through packing of frame blocks. Going a step further, we propose a similarity based block sparse subset selection (SB2S3) model by applying a specially designed transformation matrix on the KBS3 model in order to introduce a kind of global inter-frame relationship through the similarity. Finally, a greedy pursuit based algorithm is devised for the proposed NP-hard model optimization. The proposed SB2S3 has the following advantages: 1) through the similarity between each frame and any other frame, the global relationship among all frames can be considered; 2) through block sparse coding, the local relationship of adjacent frames is further considered; and 3) it has a wider application, since features can derive similarity, but not vice versa. It is believed that the effect of modeling such global and local relationships among frames in this paper, is similar to that of modeling the long-range and short-range dependencies among frames in deep learning based methods. Experimental results on three benchmark datasets have demonstrated that the proposed approach is superior to not only other sparse subset selection based VS methods but also most unsupervised deep-learning based VS methods.

AB - Video summarization (VS) is generally formulated as a subset selection problem where a set of representative keyframes or key segments is selected from an entire video frame set. Though many sparse subset selection based VS algorithms have been proposed in the past decade, most of them adopt linear sparse formulation in the explicit feature vector space of video frames, and don't consider the local or global relationships among frames. In this paper, we first extend the conventional sparse subset selection for VS into kernel block sparse subset selection (KBS3) to utilize the advantage of kernel sparse coding and introduce a local inter-frame relationship through packing of frame blocks. Going a step further, we propose a similarity based block sparse subset selection (SB2S3) model by applying a specially designed transformation matrix on the KBS3 model in order to introduce a kind of global inter-frame relationship through the similarity. Finally, a greedy pursuit based algorithm is devised for the proposed NP-hard model optimization. The proposed SB2S3 has the following advantages: 1) through the similarity between each frame and any other frame, the global relationship among all frames can be considered; 2) through block sparse coding, the local relationship of adjacent frames is further considered; and 3) it has a wider application, since features can derive similarity, but not vice versa. It is believed that the effect of modeling such global and local relationships among frames in this paper, is similar to that of modeling the long-range and short-range dependencies among frames in deep learning based methods. Experimental results on three benchmark datasets have demonstrated that the proposed approach is superior to not only other sparse subset selection based VS methods but also most unsupervised deep-learning based VS methods.

KW - block sparsity

KW - Kernel sparse representation

KW - similarity

KW - Video summarization

UR - http://www.scopus.com/inward/record.url?scp=85098780302&partnerID=8YFLogxK

U2 - 10.1109/TCSVT.2020.3044600

DO - 10.1109/TCSVT.2020.3044600

M3 - 文章

AN - SCOPUS:85098780302

SN - 1051-8215

VL - 31

SP - 3967

EP - 3980

JO - IEEE Transactions on Circuits and Systems for Video Technology

JF - IEEE Transactions on Circuits and Systems for Video Technology

IS - 10

ER -

Similarity Based Block Sparse Subset Selection for Video Summarization

摘要

访问文件

其它文件与链接

指纹

引用此