Video Captioning with Semantic Guiding

Jin Yuan; Chunna Tian; Xiangnan Zhang; Yuxuan Ding; Wei Wei

doi:10.1109/BigMM.2018.8499357

Video Captioning with Semantic Guiding

Jin Yuan, Chunna Tian, Xiangnan Zhang, Yuxuan Ding, Wei Wei

计算机学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

14 引用（Scopus）

摘要

Video captioning is to generate descriptions of videos. Most existing approaches adopt the encoder-decoder architecture, which usually use different kinds of visual features, such as temporal features and motion features, but they neglect the abundant semantic information in the video. To address this issue, we propose a framework that jointly explores visual features and semantic attributions named Semantic Guiding Long Short-Term Memory (SG-LSTM). The proposed SG-LSTM has two semantic guiding layers, both of them use three types of semantic - global semantic, object semantic and verb semantic - attributes to guide language model to use the most relevant representation to generate sentences. We evaluate our method on the public available challenging Youtube2Text dataset. Experimental results shown that our framework outperforms the state-of-the-art methods.

源语言	英语
主期刊名	2018 IEEE 4th International Conference on Multimedia Big Data, BigMM 2018
出版商	Institute of Electrical and Electronics Engineers Inc.
ISBN（电子版）	9781538653210
DOI	https://doi.org/10.1109/BigMM.2018.8499357
出版状态	已出版 - 18 10月 2018
活动	4th IEEE International Conference on Multimedia Big Data, BigMM 2018 - Xi'an, 中国期限: 13 9月 2018 → 16 9月 2018

出版系列

姓名	2018 IEEE 4th International Conference on Multimedia Big Data, BigMM 2018

会议

会议	4th IEEE International Conference on Multimedia Big Data, BigMM 2018
国家/地区	中国
市	Xi'an
时期	13/09/18 → 16/09/18

访问文件

10.1109/BigMM.2018.8499357

其它文件与链接

链接到 Scopus 的出版物

引用此

@inproceedings{16ffb1783c614445b6471dead28d9bc8,

title = "Video Captioning with Semantic Guiding",

abstract = "Video captioning is to generate descriptions of videos. Most existing approaches adopt the encoder-decoder architecture, which usually use different kinds of visual features, such as temporal features and motion features, but they neglect the abundant semantic information in the video. To address this issue, we propose a framework that jointly explores visual features and semantic attributions named Semantic Guiding Long Short-Term Memory (SG-LSTM). The proposed SG-LSTM has two semantic guiding layers, both of them use three types of semantic - global semantic, object semantic and verb semantic - attributes to guide language model to use the most relevant representation to generate sentences. We evaluate our method on the public available challenging Youtube2Text dataset. Experimental results shown that our framework outperforms the state-of-the-art methods.",

keywords = "neural network, semantic attributes, sequence learning, Video captioning",

author = "Jin Yuan and Chunna Tian and Xiangnan Zhang and Yuxuan Ding and Wei Wei",

note = "Publisher Copyright: {\textcopyright} 2018 IEEE.; 4th IEEE International Conference on Multimedia Big Data, BigMM 2018 ; Conference date: 13-09-2018 Through 16-09-2018",

year = "2018",

month = oct,

day = "18",

doi = "10.1109/BigMM.2018.8499357",

language = "英语",

series = "2018 IEEE 4th International Conference on Multimedia Big Data, BigMM 2018",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

booktitle = "2018 IEEE 4th International Conference on Multimedia Big Data, BigMM 2018",

}

Yuan, J, Tian, C, Zhang, X, Ding, Y & Wei, W 2018, Video Captioning with Semantic Guiding. 在 2018 IEEE 4th International Conference on Multimedia Big Data, BigMM 2018., 8499357, 2018 IEEE 4th International Conference on Multimedia Big Data, BigMM 2018, Institute of Electrical and Electronics Engineers Inc., 4th IEEE International Conference on Multimedia Big Data, BigMM 2018, Xi'an, 中国, 13/09/18. https://doi.org/10.1109/BigMM.2018.8499357

Video Captioning with Semantic Guiding. / Yuan, Jin; Tian, Chunna; Zhang, Xiangnan 等.
2018 IEEE 4th International Conference on Multimedia Big Data, BigMM 2018. Institute of Electrical and Electronics Engineers Inc., 2018. 8499357 (2018 IEEE 4th International Conference on Multimedia Big Data, BigMM 2018).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Video Captioning with Semantic Guiding

AU - Yuan, Jin

AU - Tian, Chunna

AU - Zhang, Xiangnan

AU - Ding, Yuxuan

AU - Wei, Wei

PY - 2018/10/18

Y1 - 2018/10/18

N2 - Video captioning is to generate descriptions of videos. Most existing approaches adopt the encoder-decoder architecture, which usually use different kinds of visual features, such as temporal features and motion features, but they neglect the abundant semantic information in the video. To address this issue, we propose a framework that jointly explores visual features and semantic attributions named Semantic Guiding Long Short-Term Memory (SG-LSTM). The proposed SG-LSTM has two semantic guiding layers, both of them use three types of semantic - global semantic, object semantic and verb semantic - attributes to guide language model to use the most relevant representation to generate sentences. We evaluate our method on the public available challenging Youtube2Text dataset. Experimental results shown that our framework outperforms the state-of-the-art methods.

AB - Video captioning is to generate descriptions of videos. Most existing approaches adopt the encoder-decoder architecture, which usually use different kinds of visual features, such as temporal features and motion features, but they neglect the abundant semantic information in the video. To address this issue, we propose a framework that jointly explores visual features and semantic attributions named Semantic Guiding Long Short-Term Memory (SG-LSTM). The proposed SG-LSTM has two semantic guiding layers, both of them use three types of semantic - global semantic, object semantic and verb semantic - attributes to guide language model to use the most relevant representation to generate sentences. We evaluate our method on the public available challenging Youtube2Text dataset. Experimental results shown that our framework outperforms the state-of-the-art methods.

KW - neural network

KW - semantic attributes

KW - sequence learning

KW - Video captioning

UR - http://www.scopus.com/inward/record.url?scp=85057073142&partnerID=8YFLogxK

U2 - 10.1109/BigMM.2018.8499357

DO - 10.1109/BigMM.2018.8499357

M3 - 会议稿件

AN - SCOPUS:85057073142

T3 - 2018 IEEE 4th International Conference on Multimedia Big Data, BigMM 2018

BT - 2018 IEEE 4th International Conference on Multimedia Big Data, BigMM 2018

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 4th IEEE International Conference on Multimedia Big Data, BigMM 2018

Y2 - 13 September 2018 through 16 September 2018

ER -

Video Captioning with Semantic Guiding

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此