Word-Sentence Framework for Remote Sensing Image Captioning

Qi Wang, Wei Huang, Xueting Zhang, Xuelong Li

科研成果: 期刊稿件文章同行评审

86 引用 (Scopus)

摘要

Remote sensing image captioning (RSIC), which aims at generating a well-formed sentence for a remote sensing image, has attracted more attention in recent years. The general framework for RSIC is the encoder-decoder architecture containing two submodels of encoder and decoder. Although the significant performance is obtained, the encoder-decoder architecture is a black-box model with a lack of explainability. To overcome this drawback, in this article, we propose a new explainable word-sentence framework for RSIC. The proposed word-sentence framework consists of two parts: word extractor and sentence generator, where the former extracts the valuable words in the given remote sensing image, while the latter organizes these words into a well-formed sentence. The proposed framework decomposes RSIC into a word classification task and a word sorting task, which is more in line with human intuitive understanding. On the basis of the word-sentence framework, some ablation experiments are conducted on the three public RSIC data sets of Sydney-captions, UCM-captions, and RSICD to explore the specific and effective network structures. In order to evaluate the proposed word-sentence framework objectively, we further conduct some comparative experiments on these three data sets and achieve comparable results in comparison with the encoder-decoder-based methods.

源语言英语
页(从-至)10532-10543
页数12
期刊IEEE Transactions on Geoscience and Remote Sensing
59
12
DOI
出版状态已出版 - 1 12月 2021

指纹

探究 'Word-Sentence Framework for Remote Sensing Image Captioning' 的科研主题。它们共同构成独一无二的指纹。

引用此