Video Captioning with Semantic Guiding

Jin Yuan, Chunna Tian, Xiangnan Zhang, Yuxuan Ding, Wei Wei

科研成果: 书/报告/会议事项章节会议稿件同行评审

14 引用 (Scopus)

摘要

Video captioning is to generate descriptions of videos. Most existing approaches adopt the encoder-decoder architecture, which usually use different kinds of visual features, such as temporal features and motion features, but they neglect the abundant semantic information in the video. To address this issue, we propose a framework that jointly explores visual features and semantic attributions named Semantic Guiding Long Short-Term Memory (SG-LSTM). The proposed SG-LSTM has two semantic guiding layers, both of them use three types of semantic - global semantic, object semantic and verb semantic - attributes to guide language model to use the most relevant representation to generate sentences. We evaluate our method on the public available challenging Youtube2Text dataset. Experimental results shown that our framework outperforms the state-of-the-art methods.

源语言英语
主期刊名2018 IEEE 4th International Conference on Multimedia Big Data, BigMM 2018
出版商Institute of Electrical and Electronics Engineers Inc.
ISBN(电子版)9781538653210
DOI
出版状态已出版 - 18 10月 2018
活动4th IEEE International Conference on Multimedia Big Data, BigMM 2018 - Xi'an, 中国
期限: 13 9月 201816 9月 2018

出版系列

姓名2018 IEEE 4th International Conference on Multimedia Big Data, BigMM 2018

会议

会议4th IEEE International Conference on Multimedia Big Data, BigMM 2018
国家/地区中国
Xi'an
时期13/09/1816/09/18

指纹

探究 'Video Captioning with Semantic Guiding' 的科研主题。它们共同构成独一无二的指纹。

引用此