MMT: Mixed-Mask Transformer for Remote Sensing Image Semantic Segmentation

Research output: Contribution to journalArticlepeer-review

17 Scopus citations

Abstract

Remote sensing image semantic segmentation is a crucial step in the intelligent interpretation of remote sensing. Most of the current approaches are based on the attention mechanism to enhance long-range representations. However, these works ignore the key problem of foreground-background imbalance, and their performances encounter a bottleneck. In this article, we introduce mask classification into remote sensing image interpretation for the first time and propose a novel mixed-mask Transformer (MMT) for remote sensing image semantic segmentation. Specifically, we propose a mixed-mask attention mechanism, a simple but effective module, which assists the network to learn more explicit intraclass and interclass correlations by capturing long-range interdependent representations. In addition, a progressive multiscale learning strategy (MSL) is proposed to solve the problem of large-scale-varied targets in remote sensing images, which integrates semantic and visual representations of different scale targets by efficiently utilizing large-scale feature maps in Transformer. Experimental results show that the proposed MMT exceeds the existing alternative approaches and achieves state-of-the-art performance on three semantic segmentation datasets.

Original languageEnglish
Article number5613415
JournalIEEE Transactions on Geoscience and Remote Sensing
Volume61
DOIs
StatePublished - 2023

Keywords

  • Attention mechanism
  • foreground-background imbalance
  • remote sensing
  • semantic segmentation
  • Transformer

Fingerprint

Dive into the research topics of 'MMT: Mixed-Mask Transformer for Remote Sensing Image Semantic Segmentation'. Together they form a unique fingerprint.

Cite this