Attentive Linear Transformation for Image Captioning

Senmao Ye, Junwei Han, Nian Liu

Research output: Contribution to journalArticlepeer-review

71 Scopus citations

Abstract

We propose a novel attention framework called attentive linear transformation (ALT) for automatic generation of image captions. Instead of learning the spatial or channel-wise attention in existing models, ALT learns to attend to the high-dimensional transformation matrix from the image feature space to the context vector space. Thus ALT can learn various relevant feature abstractions, including spatial attention, channel-wise attention, and visual dependence. Besides, we propose a soft threshold regression to predict the spatial attention probabilities. It preserves more relevant local regions than popular softmax regression. Extensive experiments on the MS COCO and the Flickr30k data sets all demonstrate the superiority of our model compared with other state-of-the-art models.

Original languageEnglish
Article number8410621
Pages (from-to)5514-5524
Number of pages11
JournalIEEE Transactions on Image Processing
Volume27
Issue number11
DOIs
StatePublished - Nov 2018

Keywords

  • attention
  • CNN
  • Image captioning
  • linear transformation
  • LSTM

Fingerprint

Dive into the research topics of 'Attentive Linear Transformation for Image Captioning'. Together they form a unique fingerprint.

Cite this