Attentive Linear Transformation for Image Captioning

Senmao Ye, Junwei Han, Nian Liu

科研成果: 期刊稿件文章同行评审

70 引用 (Scopus)

摘要

We propose a novel attention framework called attentive linear transformation (ALT) for automatic generation of image captions. Instead of learning the spatial or channel-wise attention in existing models, ALT learns to attend to the high-dimensional transformation matrix from the image feature space to the context vector space. Thus ALT can learn various relevant feature abstractions, including spatial attention, channel-wise attention, and visual dependence. Besides, we propose a soft threshold regression to predict the spatial attention probabilities. It preserves more relevant local regions than popular softmax regression. Extensive experiments on the MS COCO and the Flickr30k data sets all demonstrate the superiority of our model compared with other state-of-the-art models.

源语言英语
文章编号8410621
页(从-至)5514-5524
页数11
期刊IEEE Transactions on Image Processing
27
11
DOI
出版状态已出版 - 11月 2018

指纹

探究 'Attentive Linear Transformation for Image Captioning' 的科研主题。它们共同构成独一无二的指纹。

引用此