MMGLOTS: Multi-Modal Global-Local Transformer Segmentor for Remote Sensing Image Segmentation

Yuheng Liu; Ye Wang; Yifan Zhang; Shaohui Mei

doi:10.1109/WHISPERS61460.2023.10431036

MMGLOTS: Multi-Modal Global-Local Transformer Segmentor for Remote Sensing Image Segmentation

Yuheng Liu, Ye Wang, Yifan Zhang, Shaohui Mei

School of Electronics and Information

Northwestern Polytechnical University Xian

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

3 Scopus citations

Abstract

Multi-modal semantic segmentation of remote sensing (RS) images is a challenging task due to the complex relationship between different modalities and the large intra-class variance of objects in RS images. Existing semantic segmentation methods can only utilize the information of a single modality, which is not sufficient to obtain accurate segmentation results. To address this problem, in this paper, a novel multimodal global-local transformer segmentor (MMGLOTS) is proposed to cope with the multi-modal semantic segmentation task. Specifically, the semantic features of each modality are extracted by the multi-modal semantic feature extractor (MMSFE) with an adaptive fusion strategy. Then, the features are aggregated, and deep representations of both local and global dependencies are obtained by the global-local transformer (GLT). The final prediction is obtained by progressively restoring the deep representations with a prediction restorer (PR). Extensive experiments on two multi-modal semantic segmentation datasets show that our method achieves superior performance and the proposed method achieves the first place on the newly held Cross-City Multi-modal Semantic Segmentation Challenge 2023.

Original language	English
Title of host publication	2023 13th Workshop on Hyperspectral Imaging and Signal Processing
Subtitle of host publication	Evolution in Remote Sensing, WHISPERS 2023
Publisher	IEEE Computer Society
ISBN (Electronic)	9798350395570
DOIs	https://doi.org/10.1109/WHISPERS61460.2023.10431036
State	Published - 2023
Event	13th Workshop on Hyperspectral Imaging and Signal Processing: Evolution in Remote Sensing, WHISPERS 2023 - Athens, Greece Duration: 31 Oct 2023 → 2 Nov 2023

Publication series

Name	Workshop on Hyperspectral Image and Signal Processing, Evolution in Remote Sensing
ISSN (Print)	2158-6276

Conference

Conference	13th Workshop on Hyperspectral Imaging and Signal Processing: Evolution in Remote Sensing, WHISPERS 2023
Country/Territory	Greece
City	Athens
Period	31/10/23 → 2/11/23

Keywords

Global-local
Multi-modal
Semantic segmentation
Transformer

Access to Document

10.1109/WHISPERS61460.2023.10431036

Cite this

Liu, Y., Wang, Y., Zhang, Y., & Mei, S. (2023). MMGLOTS: Multi-Modal Global-Local Transformer Segmentor for Remote Sensing Image Segmentation. In 2023 13th Workshop on Hyperspectral Imaging and Signal Processing: Evolution in Remote Sensing, WHISPERS 2023 (Workshop on Hyperspectral Image and Signal Processing, Evolution in Remote Sensing). IEEE Computer Society. https://doi.org/10.1109/WHISPERS61460.2023.10431036

@inproceedings{ac7700da78f64f1ab93df22a6da036f5,

title = "MMGLOTS: Multi-Modal Global-Local Transformer Segmentor for Remote Sensing Image Segmentation",

abstract = "Multi-modal semantic segmentation of remote sensing (RS) images is a challenging task due to the complex relationship between different modalities and the large intra-class variance of objects in RS images. Existing semantic segmentation methods can only utilize the information of a single modality, which is not sufficient to obtain accurate segmentation results. To address this problem, in this paper, a novel multimodal global-local transformer segmentor (MMGLOTS) is proposed to cope with the multi-modal semantic segmentation task. Specifically, the semantic features of each modality are extracted by the multi-modal semantic feature extractor (MMSFE) with an adaptive fusion strategy. Then, the features are aggregated, and deep representations of both local and global dependencies are obtained by the global-local transformer (GLT). The final prediction is obtained by progressively restoring the deep representations with a prediction restorer (PR). Extensive experiments on two multi-modal semantic segmentation datasets show that our method achieves superior performance and the proposed method achieves the first place on the newly held Cross-City Multi-modal Semantic Segmentation Challenge 2023.",

keywords = "Global-local, Multi-modal, Semantic segmentation, Transformer",

author = "Yuheng Liu and Ye Wang and Yifan Zhang and Shaohui Mei",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 13th Workshop on Hyperspectral Imaging and Signal Processing: Evolution in Remote Sensing, WHISPERS 2023 ; Conference date: 31-10-2023 Through 02-11-2023",

year = "2023",

doi = "10.1109/WHISPERS61460.2023.10431036",

language = "英语",

series = "Workshop on Hyperspectral Image and Signal Processing, Evolution in Remote Sensing",

publisher = "IEEE Computer Society",

booktitle = "2023 13th Workshop on Hyperspectral Imaging and Signal Processing",

}

Liu, Y, Wang, Y, Zhang, Y & Mei, S 2023, MMGLOTS: Multi-Modal Global-Local Transformer Segmentor for Remote Sensing Image Segmentation. in 2023 13th Workshop on Hyperspectral Imaging and Signal Processing: Evolution in Remote Sensing, WHISPERS 2023. Workshop on Hyperspectral Image and Signal Processing, Evolution in Remote Sensing, IEEE Computer Society, 13th Workshop on Hyperspectral Imaging and Signal Processing: Evolution in Remote Sensing, WHISPERS 2023, Athens, Greece, 31/10/23. https://doi.org/10.1109/WHISPERS61460.2023.10431036

MMGLOTS: Multi-Modal Global-Local Transformer Segmentor for Remote Sensing Image Segmentation. / Liu, Yuheng; Wang, Ye; Zhang, Yifan et al.
2023 13th Workshop on Hyperspectral Imaging and Signal Processing: Evolution in Remote Sensing, WHISPERS 2023. IEEE Computer Society, 2023. (Workshop on Hyperspectral Image and Signal Processing, Evolution in Remote Sensing).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - MMGLOTS

T2 - 13th Workshop on Hyperspectral Imaging and Signal Processing: Evolution in Remote Sensing, WHISPERS 2023

AU - Liu, Yuheng

AU - Wang, Ye

AU - Zhang, Yifan

AU - Mei, Shaohui

PY - 2023

Y1 - 2023

N2 - Multi-modal semantic segmentation of remote sensing (RS) images is a challenging task due to the complex relationship between different modalities and the large intra-class variance of objects in RS images. Existing semantic segmentation methods can only utilize the information of a single modality, which is not sufficient to obtain accurate segmentation results. To address this problem, in this paper, a novel multimodal global-local transformer segmentor (MMGLOTS) is proposed to cope with the multi-modal semantic segmentation task. Specifically, the semantic features of each modality are extracted by the multi-modal semantic feature extractor (MMSFE) with an adaptive fusion strategy. Then, the features are aggregated, and deep representations of both local and global dependencies are obtained by the global-local transformer (GLT). The final prediction is obtained by progressively restoring the deep representations with a prediction restorer (PR). Extensive experiments on two multi-modal semantic segmentation datasets show that our method achieves superior performance and the proposed method achieves the first place on the newly held Cross-City Multi-modal Semantic Segmentation Challenge 2023.

AB - Multi-modal semantic segmentation of remote sensing (RS) images is a challenging task due to the complex relationship between different modalities and the large intra-class variance of objects in RS images. Existing semantic segmentation methods can only utilize the information of a single modality, which is not sufficient to obtain accurate segmentation results. To address this problem, in this paper, a novel multimodal global-local transformer segmentor (MMGLOTS) is proposed to cope with the multi-modal semantic segmentation task. Specifically, the semantic features of each modality are extracted by the multi-modal semantic feature extractor (MMSFE) with an adaptive fusion strategy. Then, the features are aggregated, and deep representations of both local and global dependencies are obtained by the global-local transformer (GLT). The final prediction is obtained by progressively restoring the deep representations with a prediction restorer (PR). Extensive experiments on two multi-modal semantic segmentation datasets show that our method achieves superior performance and the proposed method achieves the first place on the newly held Cross-City Multi-modal Semantic Segmentation Challenge 2023.

KW - Global-local

KW - Multi-modal

KW - Semantic segmentation

KW - Transformer

UR - http://www.scopus.com/inward/record.url?scp=85186263823&partnerID=8YFLogxK

U2 - 10.1109/WHISPERS61460.2023.10431036

DO - 10.1109/WHISPERS61460.2023.10431036

M3 - 会议稿件

AN - SCOPUS:85186263823

T3 - Workshop on Hyperspectral Image and Signal Processing, Evolution in Remote Sensing

BT - 2023 13th Workshop on Hyperspectral Imaging and Signal Processing

PB - IEEE Computer Society

Y2 - 31 October 2023 through 2 November 2023

ER -

Liu Y, Wang Y, Zhang Y, Mei S. MMGLOTS: Multi-Modal Global-Local Transformer Segmentor for Remote Sensing Image Segmentation. In 2023 13th Workshop on Hyperspectral Imaging and Signal Processing: Evolution in Remote Sensing, WHISPERS 2023. IEEE Computer Society. 2023. (Workshop on Hyperspectral Image and Signal Processing, Evolution in Remote Sensing). doi: 10.1109/WHISPERS61460.2023.10431036

MMGLOTS: Multi-Modal Global-Local Transformer Segmentor for Remote Sensing Image Segmentation

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this