MMGLOTS: Multi-Modal Global-Local Transformer Segmentor for Remote Sensing Image Segmentation

Yuheng Liu, Ye Wang, Yifan Zhang, Shaohui Mei

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

Multi-modal semantic segmentation of remote sensing (RS) images is a challenging task due to the complex relationship between different modalities and the large intra-class variance of objects in RS images. Existing semantic segmentation methods can only utilize the information of a single modality, which is not sufficient to obtain accurate segmentation results. To address this problem, in this paper, a novel multimodal global-local transformer segmentor (MMGLOTS) is proposed to cope with the multi-modal semantic segmentation task. Specifically, the semantic features of each modality are extracted by the multi-modal semantic feature extractor (MMSFE) with an adaptive fusion strategy. Then, the features are aggregated, and deep representations of both local and global dependencies are obtained by the global-local transformer (GLT). The final prediction is obtained by progressively restoring the deep representations with a prediction restorer (PR). Extensive experiments on two multi-modal semantic segmentation datasets show that our method achieves superior performance and the proposed method achieves the first place on the newly held Cross-City Multi-modal Semantic Segmentation Challenge 2023.

Original languageEnglish
Title of host publication2023 13th Workshop on Hyperspectral Imaging and Signal Processing
Subtitle of host publicationEvolution in Remote Sensing, WHISPERS 2023
PublisherIEEE Computer Society
ISBN (Electronic)9798350395570
DOIs
StatePublished - 2023
Event13th Workshop on Hyperspectral Imaging and Signal Processing: Evolution in Remote Sensing, WHISPERS 2023 - Athens, Greece
Duration: 31 Oct 20232 Nov 2023

Publication series

NameWorkshop on Hyperspectral Image and Signal Processing, Evolution in Remote Sensing
ISSN (Print)2158-6276

Conference

Conference13th Workshop on Hyperspectral Imaging and Signal Processing: Evolution in Remote Sensing, WHISPERS 2023
Country/TerritoryGreece
CityAthens
Period31/10/232/11/23

Keywords

  • Global-local
  • Multi-modal
  • Semantic segmentation
  • Transformer

Fingerprint

Dive into the research topics of 'MMGLOTS: Multi-Modal Global-Local Transformer Segmentor for Remote Sensing Image Segmentation'. Together they form a unique fingerprint.

Cite this