U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation

Bingyu Li, Da Zhang, Zhiyuan Zhao, Junyu Gao, Xuelong Li

Research output: Contribution to journalArticlepeer-review

Abstract

Multimodal Semantic Segmentation is a pivotal component of the transportation system and typically surpasses unimodal methods by utilizing rich information sets from various sources. Current models frequently adopt modality-specific frameworks that are inherently biased toward certain modalities. Although these biases might be advantageous in specific situations, they generally limit the adaptability of the models across different multimodal contexts, thereby potentially impairing performance. To address this issue, we leverage the inherent capabilities of the model itself to discover the optimal equilibrium in multimodal fusion and introduce U3M: An Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation. Specifically, this method involves an unbiased integration of multimodal visual data. Additionally, we employ feature fusion at multiple scales to ensure the effective extraction and integration of both global and local features. Experimental results demonstrate that our approach achieves superior performance across multiple datasets, verifying its efficacy in enhancing the robustness and versatility of semantic segmentation in diverse settings. Our code is available at https://github.com/LiBingyu01/U3M-multimodal-semantic-segmentation.

Original languageEnglish
Article number111801
JournalPattern Recognition
Volume168
DOIs
StatePublished - Dec 2025

Keywords

  • Multi-modality
  • Multi-scale fusion
  • Semantic segmentation
  • Unbiased modality fusion

Cite this