Spatial-Channel Attention Transformer With Pseudo Regions for Remote Sensing Image-Text Retrieval

Dongqing Wu, Huihui Li, Yinxuan Hou, Cuili Xu, Gong Cheng, Lei Guo, Hang Liu

科研成果: 期刊稿件文章同行评审

5 引用 (Scopus)

摘要

Recently, remote sensing image-text retrieval (RSITR) has received significant attention due to its flexible query form and effective management of remote sensing images. However, prior work often relies on compact global features and ignores local features that can reflect salient objects in the images. Moreover, these methods primarily model interactions between features in the spatial domain, which is insufficient for mining the rich semantic information presented in remote sensing images. In this article, we propose a novel spatial-channel attention transformer (SCAT) with pseudo regions to address these issues. Concretely, in order to acquire the fine-grained perception of local objects, we introduce a pseudo region generation (PRG) module that adaptively aggregates grid features with similar semantic information into multiple clusters through a clustering algorithm. These generated cluster centers are able to flexibly and efficiently represent local objects in remote sensing images without relying on sophisticated object detectors. Furthermore, in order to achieve a comprehensive understanding of image semantics information, we carefully construct a novel SCAT. By exploiting spatial and channel attention to explore the dependencies between features at both spatial and channel domains, the proposed SCAT enhances the model's ability to identify both 'where to look' and 'what it is,' thereby obtaining a more powerful representation. In addition, SCAT incorporates two novel designs that alleviate the high overhead caused by attention modeling. Extensive experiments on two benchmark datasets, RSICD and RSITMD, fully demonstrate the effectiveness and superiority of our proposed method.

源语言英语
文章编号4704115
页(从-至)1-15
页数15
期刊IEEE Transactions on Geoscience and Remote Sensing
62
DOI
出版状态已出版 - 2024

指纹

探究 'Spatial-Channel Attention Transformer With Pseudo Regions for Remote Sensing Image-Text Retrieval' 的科研主题。它们共同构成独一无二的指纹。

引用此