摘要
While most existing video summarization approaches aim to identify important frames of a video from either a global or local perspective, we propose a top-down approach consisting of scene identification and scene summarization. For scene identification, we represent each frame with global features and utilize a scalable clustering method.We then formulate scene summarization as choosing those frames that best cover a set of local descriptors with minimal redundancy. In addition, we develop a visual word-based approach to make our approach more computationally scalable. Experimental results on two benchmark datasets demonstrate that our proposed approach clearly outperforms the state-of-the-art.
源语言 | 英语 |
---|---|
文章编号 | 4 |
期刊 | ACM Transactions on Multimedia Computing, Communications and Applications |
卷 | 11 |
期 | 1 |
DOI | |
出版状态 | 已出版 - 8月 2014 |