MMT: Mixed-Mask Transformer for Remote Sensing Image Semantic Segmentation

Zhe Xu; Jie Geng; Wen Jiang

doi:10.1109/TGRS.2023.3289408

MMT: Mixed-Mask Transformer for Remote Sensing Image Semantic Segmentation

Zhe Xu, Jie Geng, Wen Jiang

School of Electronics and Information

Northwestern Polytechnical University Xian

Research output: Contribution to journal › Article › peer-review

17 Scopus citations

Abstract

Remote sensing image semantic segmentation is a crucial step in the intelligent interpretation of remote sensing. Most of the current approaches are based on the attention mechanism to enhance long-range representations. However, these works ignore the key problem of foreground-background imbalance, and their performances encounter a bottleneck. In this article, we introduce mask classification into remote sensing image interpretation for the first time and propose a novel mixed-mask Transformer (MMT) for remote sensing image semantic segmentation. Specifically, we propose a mixed-mask attention mechanism, a simple but effective module, which assists the network to learn more explicit intraclass and interclass correlations by capturing long-range interdependent representations. In addition, a progressive multiscale learning strategy (MSL) is proposed to solve the problem of large-scale-varied targets in remote sensing images, which integrates semantic and visual representations of different scale targets by efficiently utilizing large-scale feature maps in Transformer. Experimental results show that the proposed MMT exceeds the existing alternative approaches and achieves state-of-the-art performance on three semantic segmentation datasets.

Original language	English
Article number	5613415
Journal	IEEE Transactions on Geoscience and Remote Sensing
Volume	61
DOIs	https://doi.org/10.1109/TGRS.2023.3289408
State	Published - 2023

Keywords

Attention mechanism
foreground-background imbalance
remote sensing
semantic segmentation
Transformer

Access to Document

10.1109/TGRS.2023.3289408

Cite this

@article{f6e565fb30ec4523b13f880d981c6de0,

title = "MMT: Mixed-Mask Transformer for Remote Sensing Image Semantic Segmentation",

abstract = "Remote sensing image semantic segmentation is a crucial step in the intelligent interpretation of remote sensing. Most of the current approaches are based on the attention mechanism to enhance long-range representations. However, these works ignore the key problem of foreground-background imbalance, and their performances encounter a bottleneck. In this article, we introduce mask classification into remote sensing image interpretation for the first time and propose a novel mixed-mask Transformer (MMT) for remote sensing image semantic segmentation. Specifically, we propose a mixed-mask attention mechanism, a simple but effective module, which assists the network to learn more explicit intraclass and interclass correlations by capturing long-range interdependent representations. In addition, a progressive multiscale learning strategy (MSL) is proposed to solve the problem of large-scale-varied targets in remote sensing images, which integrates semantic and visual representations of different scale targets by efficiently utilizing large-scale feature maps in Transformer. Experimental results show that the proposed MMT exceeds the existing alternative approaches and achieves state-of-the-art performance on three semantic segmentation datasets.",

keywords = "Attention mechanism, foreground-background imbalance, remote sensing, semantic segmentation, Transformer",

author = "Zhe Xu and Jie Geng and Wen Jiang",

note = "Publisher Copyright: {\textcopyright} 1980-2012 IEEE.",

year = "2023",

doi = "10.1109/TGRS.2023.3289408",

language = "英语",

volume = "61",

journal = "IEEE Transactions on Geoscience and Remote Sensing",

issn = "0196-2892",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - MMT

T2 - Mixed-Mask Transformer for Remote Sensing Image Semantic Segmentation

AU - Xu, Zhe

AU - Geng, Jie

AU - Jiang, Wen

PY - 2023

Y1 - 2023

N2 - Remote sensing image semantic segmentation is a crucial step in the intelligent interpretation of remote sensing. Most of the current approaches are based on the attention mechanism to enhance long-range representations. However, these works ignore the key problem of foreground-background imbalance, and their performances encounter a bottleneck. In this article, we introduce mask classification into remote sensing image interpretation for the first time and propose a novel mixed-mask Transformer (MMT) for remote sensing image semantic segmentation. Specifically, we propose a mixed-mask attention mechanism, a simple but effective module, which assists the network to learn more explicit intraclass and interclass correlations by capturing long-range interdependent representations. In addition, a progressive multiscale learning strategy (MSL) is proposed to solve the problem of large-scale-varied targets in remote sensing images, which integrates semantic and visual representations of different scale targets by efficiently utilizing large-scale feature maps in Transformer. Experimental results show that the proposed MMT exceeds the existing alternative approaches and achieves state-of-the-art performance on three semantic segmentation datasets.

AB - Remote sensing image semantic segmentation is a crucial step in the intelligent interpretation of remote sensing. Most of the current approaches are based on the attention mechanism to enhance long-range representations. However, these works ignore the key problem of foreground-background imbalance, and their performances encounter a bottleneck. In this article, we introduce mask classification into remote sensing image interpretation for the first time and propose a novel mixed-mask Transformer (MMT) for remote sensing image semantic segmentation. Specifically, we propose a mixed-mask attention mechanism, a simple but effective module, which assists the network to learn more explicit intraclass and interclass correlations by capturing long-range interdependent representations. In addition, a progressive multiscale learning strategy (MSL) is proposed to solve the problem of large-scale-varied targets in remote sensing images, which integrates semantic and visual representations of different scale targets by efficiently utilizing large-scale feature maps in Transformer. Experimental results show that the proposed MMT exceeds the existing alternative approaches and achieves state-of-the-art performance on three semantic segmentation datasets.

KW - Attention mechanism

KW - foreground-background imbalance

KW - remote sensing

KW - semantic segmentation

KW - Transformer

UR - http://www.scopus.com/inward/record.url?scp=85163759886&partnerID=8YFLogxK

U2 - 10.1109/TGRS.2023.3289408

DO - 10.1109/TGRS.2023.3289408

M3 - 文章

AN - SCOPUS:85163759886

SN - 0196-2892

VL - 61

JO - IEEE Transactions on Geoscience and Remote Sensing

JF - IEEE Transactions on Geoscience and Remote Sensing

M1 - 5613415

ER -

MMT: Mixed-Mask Transformer for Remote Sensing Image Semantic Segmentation

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this