Spatial-Channel Attention Transformer With Pseudo Regions for Remote Sensing Image-Text Retrieval

Dongqing Wu, Huihui Li, Yinxuan Hou, Cuili Xu, Gong Cheng, Lei Guo, Hang Liu

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

Recently, remote sensing image-text retrieval (RSITR) has received significant attention due to its flexible query form and effective management of remote sensing images. However, prior work often relies on compact global features and ignores local features that can reflect salient objects in the images. Moreover, these methods primarily model interactions between features in the spatial domain, which is insufficient for mining the rich semantic information presented in remote sensing images. In this article, we propose a novel spatial-channel attention transformer (SCAT) with pseudo regions to address these issues. Concretely, in order to acquire the fine-grained perception of local objects, we introduce a pseudo region generation (PRG) module that adaptively aggregates grid features with similar semantic information into multiple clusters through a clustering algorithm. These generated cluster centers are able to flexibly and efficiently represent local objects in remote sensing images without relying on sophisticated object detectors. Furthermore, in order to achieve a comprehensive understanding of image semantics information, we carefully construct a novel SCAT. By exploiting spatial and channel attention to explore the dependencies between features at both spatial and channel domains, the proposed SCAT enhances the model's ability to identify both 'where to look' and 'what it is,' thereby obtaining a more powerful representation. In addition, SCAT incorporates two novel designs that alleviate the high overhead caused by attention modeling. Extensive experiments on two benchmark datasets, RSICD and RSITMD, fully demonstrate the effectiveness and superiority of our proposed method.

Original languageEnglish
Article number4704115
Pages (from-to)1-15
Number of pages15
JournalIEEE Transactions on Geoscience and Remote Sensing
Volume62
DOIs
StatePublished - 2024

Keywords

  • Attention mechanism
  • feature clustering
  • remote sensing image-text retrieval (RSITR)
  • transformer

Fingerprint

Dive into the research topics of 'Spatial-Channel Attention Transformer With Pseudo Regions for Remote Sensing Image-Text Retrieval'. Together they form a unique fingerprint.

Cite this