MAM-RNN: Multi-level attention model based RNN for video captioning

Xuelong Li, Bin Zhao, Xiaoqiang Lu

科研成果: 书/报告/会议事项章节会议稿件同行评审

99 引用 (Scopus)

摘要

Visual information is quite important for the task of video captioning. However, in the video, there are a lot of uncorrelated content, which may cause interference to generate a correct caption. Based on this point, we attempt to exploit the visual features which are most correlated to the caption. In this paper, a Multi-level Attention Model based Recurrent Neural Network (MAM-RNN) is proposed, where MAM is utilized to encode the visual feature and RNN works as the decoder to generate the video caption. During generation, the proposed approach is able to adaptively attend to the salient regions in the frame and the frames correlated to the caption. Practically, the experimental results on two benchmark datasets, i.e., MSVD and Charades, have shown the excellent performance of the proposed approach.

源语言英语
主期刊名26th International Joint Conference on Artificial Intelligence, IJCAI 2017
编辑Carles Sierra
出版商International Joint Conferences on Artificial Intelligence
2208-2214
页数7
ISBN(电子版)9780999241103
DOI
出版状态已出版 - 2017
活动26th International Joint Conference on Artificial Intelligence, IJCAI 2017 - Melbourne, 澳大利亚
期限: 19 8月 201725 8月 2017

出版系列

姓名IJCAI International Joint Conference on Artificial Intelligence
0
ISSN(印刷版)1045-0823

会议

会议26th International Joint Conference on Artificial Intelligence, IJCAI 2017
国家/地区澳大利亚
Melbourne
时期19/08/1725/08/17

指纹

探究 'MAM-RNN: Multi-level attention model based RNN for video captioning' 的科研主题。它们共同构成独一无二的指纹。

引用此