MAM-RNN: Multi-level attention model based RNN for video captioning

Xuelong Li; Bin Zhao; Xiaoqiang Lu

doi:10.24963/ijcai.2017/307

MAM-RNN: Multi-level attention model based RNN for video captioning

Xuelong Li, Bin Zhao, Xiaoqiang Lu

光电与智能研究院

CAS - Xi'an Institute of Optics and Precision Mechanics

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

99 引用（Scopus）

摘要

Visual information is quite important for the task of video captioning. However, in the video, there are a lot of uncorrelated content, which may cause interference to generate a correct caption. Based on this point, we attempt to exploit the visual features which are most correlated to the caption. In this paper, a Multi-level Attention Model based Recurrent Neural Network (MAM-RNN) is proposed, where MAM is utilized to encode the visual feature and RNN works as the decoder to generate the video caption. During generation, the proposed approach is able to adaptively attend to the salient regions in the frame and the frames correlated to the caption. Practically, the experimental results on two benchmark datasets, i.e., MSVD and Charades, have shown the excellent performance of the proposed approach.

源语言	英语
主期刊名	26th International Joint Conference on Artificial Intelligence, IJCAI 2017
编辑	Carles Sierra
出版商	International Joint Conferences on Artificial Intelligence
页	2208-2214
页数	7
ISBN（电子版）	9780999241103
DOI	https://doi.org/10.24963/ijcai.2017/307
出版状态	已出版 - 2017
活动	26th International Joint Conference on Artificial Intelligence, IJCAI 2017 - Melbourne, 澳大利亚期限: 19 8月 2017 → 25 8月 2017

出版系列

姓名	IJCAI International Joint Conference on Artificial Intelligence
卷	0
ISSN（印刷版）	1045-0823

会议

会议	26th International Joint Conference on Artificial Intelligence, IJCAI 2017
国家/地区	澳大利亚
市	Melbourne
时期	19/08/17 → 25/08/17

访问文件

10.24963/ijcai.2017/307

其它文件与链接

链接到 Scopus 的出版物

引用此

Li, X., Zhao, B., & Lu, X. (2017). MAM-RNN: Multi-level attention model based RNN for video captioning. 在 C. Sierra (编辑), 26th International Joint Conference on Artificial Intelligence, IJCAI 2017 (页码 2208-2214). (IJCAI International Joint Conference on Artificial Intelligence; 卷 0). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2017/307

@inproceedings{86763f6b280e4c078d2c1cff71a858ec,

title = "MAM-RNN: Multi-level attention model based RNN for video captioning",

abstract = "Visual information is quite important for the task of video captioning. However, in the video, there are a lot of uncorrelated content, which may cause interference to generate a correct caption. Based on this point, we attempt to exploit the visual features which are most correlated to the caption. In this paper, a Multi-level Attention Model based Recurrent Neural Network (MAM-RNN) is proposed, where MAM is utilized to encode the visual feature and RNN works as the decoder to generate the video caption. During generation, the proposed approach is able to adaptively attend to the salient regions in the frame and the frames correlated to the caption. Practically, the experimental results on two benchmark datasets, i.e., MSVD and Charades, have shown the excellent performance of the proposed approach.",

author = "Xuelong Li and Bin Zhao and Xiaoqiang Lu",

year = "2017",

doi = "10.24963/ijcai.2017/307",

language = "英语",

series = "IJCAI International Joint Conference on Artificial Intelligence",

publisher = "International Joint Conferences on Artificial Intelligence",

pages = "2208--2214",

editor = "Carles Sierra",

booktitle = "26th International Joint Conference on Artificial Intelligence, IJCAI 2017",

note = "26th International Joint Conference on Artificial Intelligence, IJCAI 2017 ; Conference date: 19-08-2017 Through 25-08-2017",

}

Li, X, Zhao, B & Lu, X 2017, MAM-RNN: Multi-level attention model based RNN for video captioning. 在 C Sierra (编辑), 26th International Joint Conference on Artificial Intelligence, IJCAI 2017. IJCAI International Joint Conference on Artificial Intelligence, 卷 0, International Joint Conferences on Artificial Intelligence, 页码 2208-2214, 26th International Joint Conference on Artificial Intelligence, IJCAI 2017, Melbourne, 澳大利亚, 19/08/17. https://doi.org/10.24963/ijcai.2017/307

MAM-RNN: Multi-level attention model based RNN for video captioning. / Li, Xuelong; Zhao, Bin; Lu, Xiaoqiang.
26th International Joint Conference on Artificial Intelligence, IJCAI 2017. 编辑 / Carles Sierra. International Joint Conferences on Artificial Intelligence, 2017. 页码 2208-2214 (IJCAI International Joint Conference on Artificial Intelligence; 卷 0).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - MAM-RNN

T2 - 26th International Joint Conference on Artificial Intelligence, IJCAI 2017

AU - Li, Xuelong

AU - Zhao, Bin

AU - Lu, Xiaoqiang

PY - 2017

Y1 - 2017

N2 - Visual information is quite important for the task of video captioning. However, in the video, there are a lot of uncorrelated content, which may cause interference to generate a correct caption. Based on this point, we attempt to exploit the visual features which are most correlated to the caption. In this paper, a Multi-level Attention Model based Recurrent Neural Network (MAM-RNN) is proposed, where MAM is utilized to encode the visual feature and RNN works as the decoder to generate the video caption. During generation, the proposed approach is able to adaptively attend to the salient regions in the frame and the frames correlated to the caption. Practically, the experimental results on two benchmark datasets, i.e., MSVD and Charades, have shown the excellent performance of the proposed approach.

AB - Visual information is quite important for the task of video captioning. However, in the video, there are a lot of uncorrelated content, which may cause interference to generate a correct caption. Based on this point, we attempt to exploit the visual features which are most correlated to the caption. In this paper, a Multi-level Attention Model based Recurrent Neural Network (MAM-RNN) is proposed, where MAM is utilized to encode the visual feature and RNN works as the decoder to generate the video caption. During generation, the proposed approach is able to adaptively attend to the salient regions in the frame and the frames correlated to the caption. Practically, the experimental results on two benchmark datasets, i.e., MSVD and Charades, have shown the excellent performance of the proposed approach.

UR - http://www.scopus.com/inward/record.url?scp=85031914301&partnerID=8YFLogxK

U2 - 10.24963/ijcai.2017/307

DO - 10.24963/ijcai.2017/307

M3 - 会议稿件

AN - SCOPUS:85031914301

T3 - IJCAI International Joint Conference on Artificial Intelligence

SP - 2208

EP - 2214

BT - 26th International Joint Conference on Artificial Intelligence, IJCAI 2017

A2 - Sierra, Carles

PB - International Joint Conferences on Artificial Intelligence

Y2 - 19 August 2017 through 25 August 2017

ER -

MAM-RNN: Multi-level attention model based RNN for video captioning

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此