Image Captioning Algorithm Based on Sufficient Visual Information and Text Information

Yongqiang Zhao, Yuan Rao, Lianwei Wu, Cong Feng

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Most existing attention-based methods on image captioning focus on the current visual information and text information at each step to generate the next word, without considering the coherence between the visual information and the text information itself. We propose sufficient visual information (SVI) module to supplement the existing visual information contained in the network, and propose sufficient text information (STI) module to predict more text Words to supplement the text information contained in the network. Sufficient visual information module embed the attention value from the past two steps into the current attention to adapt to human visual coherence. Sufficient text information module can predict the next three words in one step, and jointly use their probabilities for inference. Finally, this paper combines these two modules to form an image captioning algorithm based on sufficient visual information and text information model (SVITI) to further integrate existing visual information and future text information in the network, thereby improving the image captioning performance of the model. These three methods are used in the classic image captioning algorithm, and have achieved achieve significant performance improvement compared to the latest method on the MS COCO dataset.

Original languageEnglish
Title of host publicationNeural Information Processing - 27th International Conference, ICONIP 2020, Proceedings
EditorsHaiqin Yang, Kitsuchart Pasupa, Andrew Chi-Sing Leung, James T. Kwok, Jonathan H. Chan, Irwin King
PublisherSpringer Science and Business Media Deutschland GmbH
Pages607-615
Number of pages9
ISBN (Print)9783030638221
DOIs
StatePublished - 2020
Externally publishedYes
Event27th International Conference on Neural Information Processing, ICONIP 2020 - Bangkok, Thailand
Duration: 18 Nov 202022 Nov 2020

Publication series

NameCommunications in Computer and Information Science
Volume1333
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

Conference27th International Conference on Neural Information Processing, ICONIP 2020
Country/TerritoryThailand
CityBangkok
Period18/11/2022/11/20

Keywords

  • Image captioning
  • Sufficient text information
  • Sufficient visual information

Fingerprint

Dive into the research topics of 'Image Captioning Algorithm Based on Sufficient Visual Information and Text Information'. Together they form a unique fingerprint.

Cite this