Exploring Multi-Level Attention and Semantic Relationship for Remote Sensing Image Captioning

Zhenghang Yuan; Xuelong Li; Qi Wang

doi:10.1109/ACCESS.2019.2962195

Exploring Multi-Level Attention and Semantic Relationship for Remote Sensing Image Captioning

Zhenghang Yuan, Xuelong Li, Qi Wang

School of Artificial Intelligence, OPtics and Electronics

Northwestern Polytechnical University Xian

Research output: Contribution to journal › Article › peer-review

45 Scopus citations

Abstract

Remote sensing image captioning, which aims to understand high-level semantic information and interactions of different ground objects, is a new emerging research topic in recent years. Though image captioning has developed rapidly with convolutional neural networks (CNNs) and recurrent neural networks (RNNs), the image captioning task for remote sensing images still suffers from two main limitations. One limitation is that the scales of objects in remote sensing images vary dramatically, which makes it difficult to obtain an effective image representation. Another limitation is that the visual relationship in remote sensing images is still underused, which should have great potential to improve the final performance. In order to deal with these two limitations, an effective framework for captioning the remote sensing image is proposed in this paper. The framework is based on multi-level attention and multi-label attribute graph convolution. Specifically, the proposed multi-level attention module can adaptively focus not only on specific spatial features, but also on features of specific scales. Moreover, the designed attribute graph convolution module can employ the attribute-graph to learn more effective attribute features for image captioning. Extensive experiments are conducted and the proposed method achieves superior performance on UCM-captions, Sydney-captions and RSICD dataset.

Original language	English
Article number	8943170
Pages (from-to)	2608-2620
Number of pages	13
Journal	IEEE Access
Volume	8
DOIs	https://doi.org/10.1109/ACCESS.2019.2962195
State	Published - 2020

Keywords

deep learning
graph convolutional networks (GCNs)
image captioning
Remote sensing image
semantic understanding

Access to Document

10.1109/ACCESS.2019.2962195

Cite this

@article{9d386b4bbca3401aa58c666e12ff7c63,

title = "Exploring Multi-Level Attention and Semantic Relationship for Remote Sensing Image Captioning",

abstract = "Remote sensing image captioning, which aims to understand high-level semantic information and interactions of different ground objects, is a new emerging research topic in recent years. Though image captioning has developed rapidly with convolutional neural networks (CNNs) and recurrent neural networks (RNNs), the image captioning task for remote sensing images still suffers from two main limitations. One limitation is that the scales of objects in remote sensing images vary dramatically, which makes it difficult to obtain an effective image representation. Another limitation is that the visual relationship in remote sensing images is still underused, which should have great potential to improve the final performance. In order to deal with these two limitations, an effective framework for captioning the remote sensing image is proposed in this paper. The framework is based on multi-level attention and multi-label attribute graph convolution. Specifically, the proposed multi-level attention module can adaptively focus not only on specific spatial features, but also on features of specific scales. Moreover, the designed attribute graph convolution module can employ the attribute-graph to learn more effective attribute features for image captioning. Extensive experiments are conducted and the proposed method achieves superior performance on UCM-captions, Sydney-captions and RSICD dataset.",

keywords = "deep learning, graph convolutional networks (GCNs), image captioning, Remote sensing image, semantic understanding",

author = "Zhenghang Yuan and Xuelong Li and Qi Wang",

note = "Publisher Copyright: {\textcopyright} 2013 IEEE.",

year = "2020",

doi = "10.1109/ACCESS.2019.2962195",

language = "英语",

volume = "8",

pages = "2608--2620",

journal = "IEEE Access",

issn = "2169-3536",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Exploring Multi-Level Attention and Semantic Relationship for Remote Sensing Image Captioning

AU - Yuan, Zhenghang

AU - Li, Xuelong

AU - Wang, Qi

PY - 2020

Y1 - 2020

N2 - Remote sensing image captioning, which aims to understand high-level semantic information and interactions of different ground objects, is a new emerging research topic in recent years. Though image captioning has developed rapidly with convolutional neural networks (CNNs) and recurrent neural networks (RNNs), the image captioning task for remote sensing images still suffers from two main limitations. One limitation is that the scales of objects in remote sensing images vary dramatically, which makes it difficult to obtain an effective image representation. Another limitation is that the visual relationship in remote sensing images is still underused, which should have great potential to improve the final performance. In order to deal with these two limitations, an effective framework for captioning the remote sensing image is proposed in this paper. The framework is based on multi-level attention and multi-label attribute graph convolution. Specifically, the proposed multi-level attention module can adaptively focus not only on specific spatial features, but also on features of specific scales. Moreover, the designed attribute graph convolution module can employ the attribute-graph to learn more effective attribute features for image captioning. Extensive experiments are conducted and the proposed method achieves superior performance on UCM-captions, Sydney-captions and RSICD dataset.

AB - Remote sensing image captioning, which aims to understand high-level semantic information and interactions of different ground objects, is a new emerging research topic in recent years. Though image captioning has developed rapidly with convolutional neural networks (CNNs) and recurrent neural networks (RNNs), the image captioning task for remote sensing images still suffers from two main limitations. One limitation is that the scales of objects in remote sensing images vary dramatically, which makes it difficult to obtain an effective image representation. Another limitation is that the visual relationship in remote sensing images is still underused, which should have great potential to improve the final performance. In order to deal with these two limitations, an effective framework for captioning the remote sensing image is proposed in this paper. The framework is based on multi-level attention and multi-label attribute graph convolution. Specifically, the proposed multi-level attention module can adaptively focus not only on specific spatial features, but also on features of specific scales. Moreover, the designed attribute graph convolution module can employ the attribute-graph to learn more effective attribute features for image captioning. Extensive experiments are conducted and the proposed method achieves superior performance on UCM-captions, Sydney-captions and RSICD dataset.

KW - deep learning

KW - graph convolutional networks (GCNs)

KW - image captioning

KW - Remote sensing image

KW - semantic understanding

UR - http://www.scopus.com/inward/record.url?scp=85077264704&partnerID=8YFLogxK

U2 - 10.1109/ACCESS.2019.2962195

DO - 10.1109/ACCESS.2019.2962195

M3 - 文章

AN - SCOPUS:85077264704

SN - 2169-3536

VL - 8

SP - 2608

EP - 2620

JO - IEEE Access

JF - IEEE Access

M1 - 8943170

ER -

Exploring Multi-Level Attention and Semantic Relationship for Remote Sensing Image Captioning

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this