TY - GEN
T1 - Exploring the influence of feature representation for dictionary selection based video summarization
AU - Ma, Mingyang
AU - Mei, Shaohui
AU - Ji, Jingyu
AU - Wan, Shuai
AU - Wang, Zhiyong
AU - Feng, Dagan
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/7/2
Y1 - 2017/7/2
N2 - Dictionary selection based video summarization (VS) algorithms, in which keyframes are considered as a dictionary to reconstruct all the video frames, have been demonstrated to be effective and efficient for video summarization. It has been noticed that the feature representation of video plays a great impact of the performance of VS. In this paper, the influence of feature representation of video frames on the performance of dictionary selection-based VS is for the first time investigated. In addition to the traditional hand-crafted features used in VS, such as color histogram, the deep features learned through deep neural networks are firstly used to represent video frames for dictionary selection-based VS. The impact of dimensionality reduction to the high-dimensional deep learning features on VS is further discussed. Experimental results on a benchmark video dataset demonstrate that deep learning features are able to achieve better performance than traditional hand-crafted features for dictionary selection-based VS. Moreover, the dimensionality of deep learning features can be reduced to decrease the computational cost without the degradation of VS performance.
AB - Dictionary selection based video summarization (VS) algorithms, in which keyframes are considered as a dictionary to reconstruct all the video frames, have been demonstrated to be effective and efficient for video summarization. It has been noticed that the feature representation of video plays a great impact of the performance of VS. In this paper, the influence of feature representation of video frames on the performance of dictionary selection-based VS is for the first time investigated. In addition to the traditional hand-crafted features used in VS, such as color histogram, the deep features learned through deep neural networks are firstly used to represent video frames for dictionary selection-based VS. The impact of dimensionality reduction to the high-dimensional deep learning features on VS is further discussed. Experimental results on a benchmark video dataset demonstrate that deep learning features are able to achieve better performance than traditional hand-crafted features for dictionary selection-based VS. Moreover, the dimensionality of deep learning features can be reduced to decrease the computational cost without the degradation of VS performance.
KW - Deep learning
KW - Feature representation
KW - Sparse reconstruction
KW - Video summarization
UR - http://www.scopus.com/inward/record.url?scp=85045343937&partnerID=8YFLogxK
U2 - 10.1109/ICIP.2017.8296815
DO - 10.1109/ICIP.2017.8296815
M3 - 会议稿件
AN - SCOPUS:85045343937
T3 - Proceedings - International Conference on Image Processing, ICIP
SP - 2911
EP - 2915
BT - 2017 IEEE International Conference on Image Processing, ICIP 2017 - Proceedings
PB - IEEE Computer Society
T2 - 24th IEEE International Conference on Image Processing, ICIP 2017
Y2 - 17 September 2017 through 20 September 2017
ER -