Key Frame Extraction in the Summary Space

Xuelong Li; Bin Zhao; Xiaoqiang Lu

doi:10.1109/TCYB.2017.2718579

Key Frame Extraction in the Summary Space

Xuelong Li, Bin Zhao, Xiaoqiang Lu

光电与智能研究院

CAS - Xi'an Institute of Optics and Precision Mechanics

科研成果: 期刊稿件 › 文章 › 同行评审

44 引用（Scopus）

摘要

Key frame extraction is an efficient way to create the video summary which helps users obtain a quick comprehension of the video content. Generally, the key frames should be representative of the video content, meanwhile, diverse to reduce the redundancy. Based on the assumption that the video data are near a subspace of a high-dimensional space, a new approach, named as key frame extraction in the summary space, is proposed for key frame extraction in this paper. The proposed approach aims to find the representative frames of the video and filter out similar frames from the representative frame set. First of all, the video data are mapped to a high-dimensional space, named as summary space. Then, a new representation is learned for each frame by analyzing the intrinsic structure of the summary space. Specifically, the learned representation can reflect the representativeness of the frame, and is utilized to select representative frames. Next, the perceptual hash algorithm is employed to measure the similarity of representative frames. As a result, the key frame set is obtained after filtering out similar frames from the representative frame set. Finally, the video summary is constructed by assigning the key frames in temporal order. Additionally, the ground truth, created by filtering out similar frames from human-created summaries, is utilized to evaluate the quality of the video summary. Compared with several traditional approaches, the experimental results on 80 videos from two datasets indicate the superior performance of our approach.

源语言	英语
页（从-至）	1923-1934
页数	12
期刊	IEEE Transactions on Cybernetics
卷	48
期	6
DOI	https://doi.org/10.1109/TCYB.2017.2718579
出版状态	已出版 - 6月 2018

访问文件

10.1109/TCYB.2017.2718579

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{02dfd2f399924075bd6f8702c0438f64,

title = "Key Frame Extraction in the Summary Space",

abstract = "Key frame extraction is an efficient way to create the video summary which helps users obtain a quick comprehension of the video content. Generally, the key frames should be representative of the video content, meanwhile, diverse to reduce the redundancy. Based on the assumption that the video data are near a subspace of a high-dimensional space, a new approach, named as key frame extraction in the summary space, is proposed for key frame extraction in this paper. The proposed approach aims to find the representative frames of the video and filter out similar frames from the representative frame set. First of all, the video data are mapped to a high-dimensional space, named as summary space. Then, a new representation is learned for each frame by analyzing the intrinsic structure of the summary space. Specifically, the learned representation can reflect the representativeness of the frame, and is utilized to select representative frames. Next, the perceptual hash algorithm is employed to measure the similarity of representative frames. As a result, the key frame set is obtained after filtering out similar frames from the representative frame set. Finally, the video summary is constructed by assigning the key frames in temporal order. Additionally, the ground truth, created by filtering out similar frames from human-created summaries, is utilized to evaluate the quality of the video summary. Compared with several traditional approaches, the experimental results on 80 videos from two datasets indicate the superior performance of our approach.",

keywords = "Diverse, key frame, representative, summary space",

author = "Xuelong Li and Bin Zhao and Xiaoqiang Lu",

note = "Publisher Copyright: {\textcopyright} 2013 IEEE.",

year = "2018",

month = jun,

doi = "10.1109/TCYB.2017.2718579",

language = "英语",

volume = "48",

pages = "1923--1934",

journal = "IEEE Transactions on Cybernetics",

issn = "2168-2267",

publisher = "IEEE Advancing Technology for Humanity",

number = "6",

}

TY - JOUR

T1 - Key Frame Extraction in the Summary Space

AU - Li, Xuelong

AU - Zhao, Bin

AU - Lu, Xiaoqiang

PY - 2018/6

Y1 - 2018/6

N2 - Key frame extraction is an efficient way to create the video summary which helps users obtain a quick comprehension of the video content. Generally, the key frames should be representative of the video content, meanwhile, diverse to reduce the redundancy. Based on the assumption that the video data are near a subspace of a high-dimensional space, a new approach, named as key frame extraction in the summary space, is proposed for key frame extraction in this paper. The proposed approach aims to find the representative frames of the video and filter out similar frames from the representative frame set. First of all, the video data are mapped to a high-dimensional space, named as summary space. Then, a new representation is learned for each frame by analyzing the intrinsic structure of the summary space. Specifically, the learned representation can reflect the representativeness of the frame, and is utilized to select representative frames. Next, the perceptual hash algorithm is employed to measure the similarity of representative frames. As a result, the key frame set is obtained after filtering out similar frames from the representative frame set. Finally, the video summary is constructed by assigning the key frames in temporal order. Additionally, the ground truth, created by filtering out similar frames from human-created summaries, is utilized to evaluate the quality of the video summary. Compared with several traditional approaches, the experimental results on 80 videos from two datasets indicate the superior performance of our approach.

AB - Key frame extraction is an efficient way to create the video summary which helps users obtain a quick comprehension of the video content. Generally, the key frames should be representative of the video content, meanwhile, diverse to reduce the redundancy. Based on the assumption that the video data are near a subspace of a high-dimensional space, a new approach, named as key frame extraction in the summary space, is proposed for key frame extraction in this paper. The proposed approach aims to find the representative frames of the video and filter out similar frames from the representative frame set. First of all, the video data are mapped to a high-dimensional space, named as summary space. Then, a new representation is learned for each frame by analyzing the intrinsic structure of the summary space. Specifically, the learned representation can reflect the representativeness of the frame, and is utilized to select representative frames. Next, the perceptual hash algorithm is employed to measure the similarity of representative frames. As a result, the key frame set is obtained after filtering out similar frames from the representative frame set. Finally, the video summary is constructed by assigning the key frames in temporal order. Additionally, the ground truth, created by filtering out similar frames from human-created summaries, is utilized to evaluate the quality of the video summary. Compared with several traditional approaches, the experimental results on 80 videos from two datasets indicate the superior performance of our approach.

KW - Diverse

KW - key frame

KW - representative

KW - summary space

UR - http://www.scopus.com/inward/record.url?scp=85023169635&partnerID=8YFLogxK

U2 - 10.1109/TCYB.2017.2718579

DO - 10.1109/TCYB.2017.2718579

M3 - 文章

C2 - 28693004

AN - SCOPUS:85023169635

SN - 2168-2267

VL - 48

SP - 1923

EP - 1934

JO - IEEE Transactions on Cybernetics

JF - IEEE Transactions on Cybernetics

IS - 6

ER -

Key Frame Extraction in the Summary Space

摘要

访问文件

其它文件与链接

指纹

引用此