A top-down approach for video summarization

Genliang Guan; Zhiyong Wang; Shaohui Mei; Max Ott; Mingyi He; David Dagan Feng

doi:10.1145/2632267

A top-down approach for video summarization

Genliang Guan, Zhiyong Wang, Shaohui Mei, Max Ott, Mingyi He, David Dagan Feng

School of Electronics and Information

Research output: Contribution to journal › Article › peer-review

40 Scopus citations

Abstract

While most existing video summarization approaches aim to identify important frames of a video from either a global or local perspective, we propose a top-down approach consisting of scene identification and scene summarization. For scene identification, we represent each frame with global features and utilize a scalable clustering method.We then formulate scene summarization as choosing those frames that best cover a set of local descriptors with minimal redundancy. In addition, we develop a visual word-based approach to make our approach more computationally scalable. Experimental results on two benchmark datasets demonstrate that our proposed approach clearly outperforms the state-of-the-art.

Original language	English
Article number	4
Journal	ACM Transactions on Multimedia Computing, Communications and Applications
Volume	11
Issue number	1
DOIs	https://doi.org/10.1145/2632267
State	Published - Aug 2014

Keywords

Clustering
Keyframe extraction
Keypoint
Local visual word
Scene identification

Access to Document

10.1145/2632267

Cite this

@article{f4abc359a24c41c08d8e2d3fbe21e2da,

title = "A top-down approach for video summarization",

abstract = "While most existing video summarization approaches aim to identify important frames of a video from either a global or local perspective, we propose a top-down approach consisting of scene identification and scene summarization. For scene identification, we represent each frame with global features and utilize a scalable clustering method.We then formulate scene summarization as choosing those frames that best cover a set of local descriptors with minimal redundancy. In addition, we develop a visual word-based approach to make our approach more computationally scalable. Experimental results on two benchmark datasets demonstrate that our proposed approach clearly outperforms the state-of-the-art.",

keywords = "Clustering, Keyframe extraction, Keypoint, Local visual word, Scene identification",

author = "Genliang Guan and Zhiyong Wang and Shaohui Mei and Max Ott and Mingyi He and Feng, {David Dagan}",

year = "2014",

month = aug,

doi = "10.1145/2632267",

language = "英语",

volume = "11",

journal = "ACM Transactions on Multimedia Computing, Communications and Applications",

issn = "1551-6857",

publisher = "Association for Computing Machinery (ACM)",

number = "1",

}

TY - JOUR

T1 - A top-down approach for video summarization

AU - Guan, Genliang

AU - Wang, Zhiyong

AU - Mei, Shaohui

AU - Ott, Max

AU - He, Mingyi

AU - Feng, David Dagan

PY - 2014/8

Y1 - 2014/8

N2 - While most existing video summarization approaches aim to identify important frames of a video from either a global or local perspective, we propose a top-down approach consisting of scene identification and scene summarization. For scene identification, we represent each frame with global features and utilize a scalable clustering method.We then formulate scene summarization as choosing those frames that best cover a set of local descriptors with minimal redundancy. In addition, we develop a visual word-based approach to make our approach more computationally scalable. Experimental results on two benchmark datasets demonstrate that our proposed approach clearly outperforms the state-of-the-art.

AB - While most existing video summarization approaches aim to identify important frames of a video from either a global or local perspective, we propose a top-down approach consisting of scene identification and scene summarization. For scene identification, we represent each frame with global features and utilize a scalable clustering method.We then formulate scene summarization as choosing those frames that best cover a set of local descriptors with minimal redundancy. In addition, we develop a visual word-based approach to make our approach more computationally scalable. Experimental results on two benchmark datasets demonstrate that our proposed approach clearly outperforms the state-of-the-art.

KW - Clustering

KW - Keyframe extraction

KW - Keypoint

KW - Local visual word

KW - Scene identification

UR - http://www.scopus.com/inward/record.url?scp=84906857262&partnerID=8YFLogxK

U2 - 10.1145/2632267

DO - 10.1145/2632267

M3 - 文章

AN - SCOPUS:84906857262

SN - 1551-6857

VL - 11

JO - ACM Transactions on Multimedia Computing, Communications and Applications

JF - ACM Transactions on Multimedia Computing, Communications and Applications

IS - 1

M1 - 4

ER -

A top-down approach for video summarization

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this