TY - JOUR
T1 - MMT
T2 - Mixed-Mask Transformer for Remote Sensing Image Semantic Segmentation
AU - Xu, Zhe
AU - Geng, Jie
AU - Jiang, Wen
N1 - Publisher Copyright:
© 1980-2012 IEEE.
PY - 2023
Y1 - 2023
N2 - Remote sensing image semantic segmentation is a crucial step in the intelligent interpretation of remote sensing. Most of the current approaches are based on the attention mechanism to enhance long-range representations. However, these works ignore the key problem of foreground-background imbalance, and their performances encounter a bottleneck. In this article, we introduce mask classification into remote sensing image interpretation for the first time and propose a novel mixed-mask Transformer (MMT) for remote sensing image semantic segmentation. Specifically, we propose a mixed-mask attention mechanism, a simple but effective module, which assists the network to learn more explicit intraclass and interclass correlations by capturing long-range interdependent representations. In addition, a progressive multiscale learning strategy (MSL) is proposed to solve the problem of large-scale-varied targets in remote sensing images, which integrates semantic and visual representations of different scale targets by efficiently utilizing large-scale feature maps in Transformer. Experimental results show that the proposed MMT exceeds the existing alternative approaches and achieves state-of-the-art performance on three semantic segmentation datasets.
AB - Remote sensing image semantic segmentation is a crucial step in the intelligent interpretation of remote sensing. Most of the current approaches are based on the attention mechanism to enhance long-range representations. However, these works ignore the key problem of foreground-background imbalance, and their performances encounter a bottleneck. In this article, we introduce mask classification into remote sensing image interpretation for the first time and propose a novel mixed-mask Transformer (MMT) for remote sensing image semantic segmentation. Specifically, we propose a mixed-mask attention mechanism, a simple but effective module, which assists the network to learn more explicit intraclass and interclass correlations by capturing long-range interdependent representations. In addition, a progressive multiscale learning strategy (MSL) is proposed to solve the problem of large-scale-varied targets in remote sensing images, which integrates semantic and visual representations of different scale targets by efficiently utilizing large-scale feature maps in Transformer. Experimental results show that the proposed MMT exceeds the existing alternative approaches and achieves state-of-the-art performance on three semantic segmentation datasets.
KW - Attention mechanism
KW - foreground-background imbalance
KW - remote sensing
KW - semantic segmentation
KW - Transformer
UR - http://www.scopus.com/inward/record.url?scp=85163759886&partnerID=8YFLogxK
U2 - 10.1109/TGRS.2023.3289408
DO - 10.1109/TGRS.2023.3289408
M3 - 文章
AN - SCOPUS:85163759886
SN - 0196-2892
VL - 61
JO - IEEE Transactions on Geoscience and Remote Sensing
JF - IEEE Transactions on Geoscience and Remote Sensing
M1 - 5613415
ER -