Abstract
While most existing video summarization approaches aim to identify important frames of a video from either a global or local perspective, we propose a top-down approach consisting of scene identification and scene summarization. For scene identification, we represent each frame with global features and utilize a scalable clustering method.We then formulate scene summarization as choosing those frames that best cover a set of local descriptors with minimal redundancy. In addition, we develop a visual word-based approach to make our approach more computationally scalable. Experimental results on two benchmark datasets demonstrate that our proposed approach clearly outperforms the state-of-the-art.
Original language | English |
---|---|
Article number | 4 |
Journal | ACM Transactions on Multimedia Computing, Communications and Applications |
Volume | 11 |
Issue number | 1 |
DOIs | |
State | Published - Aug 2014 |
Keywords
- Clustering
- Keyframe extraction
- Keypoint
- Local visual word
- Scene identification