U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation

Bingyu Li; Da Zhang; Zhiyuan Zhao; Junyu Gao; Xuelong Li

doi:10.1016/j.patcog.2025.111801

U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation

Bingyu Li, Da Zhang, Zhiyuan Zhao, Junyu Gao, Xuelong Li

School of Artificial Intelligence, OPtics and Electronics

Research output: Contribution to journal › Article › peer-review

Abstract

Multimodal Semantic Segmentation is a pivotal component of the transportation system and typically surpasses unimodal methods by utilizing rich information sets from various sources. Current models frequently adopt modality-specific frameworks that are inherently biased toward certain modalities. Although these biases might be advantageous in specific situations, they generally limit the adaptability of the models across different multimodal contexts, thereby potentially impairing performance. To address this issue, we leverage the inherent capabilities of the model itself to discover the optimal equilibrium in multimodal fusion and introduce U3M: An Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation. Specifically, this method involves an unbiased integration of multimodal visual data. Additionally, we employ feature fusion at multiple scales to ensure the effective extraction and integration of both global and local features. Experimental results demonstrate that our approach achieves superior performance across multiple datasets, verifying its efficacy in enhancing the robustness and versatility of semantic segmentation in diverse settings. Our code is available at https://github.com/LiBingyu01/U3M-multimodal-semantic-segmentation.

Original language	English
Article number	111801
Journal	Pattern Recognition
Volume	168
DOIs	https://doi.org/10.1016/j.patcog.2025.111801
State	Published - Dec 2025

Keywords

Multi-modality
Multi-scale fusion
Semantic segmentation
Unbiased modality fusion

Access to Document

10.1016/j.patcog.2025.111801

Cite this

@article{e6f7d5777246446bacb970bc81b25986,

title = "U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation",

abstract = "Multimodal Semantic Segmentation is a pivotal component of the transportation system and typically surpasses unimodal methods by utilizing rich information sets from various sources. Current models frequently adopt modality-specific frameworks that are inherently biased toward certain modalities. Although these biases might be advantageous in specific situations, they generally limit the adaptability of the models across different multimodal contexts, thereby potentially impairing performance. To address this issue, we leverage the inherent capabilities of the model itself to discover the optimal equilibrium in multimodal fusion and introduce U3M: An Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation. Specifically, this method involves an unbiased integration of multimodal visual data. Additionally, we employ feature fusion at multiple scales to ensure the effective extraction and integration of both global and local features. Experimental results demonstrate that our approach achieves superior performance across multiple datasets, verifying its efficacy in enhancing the robustness and versatility of semantic segmentation in diverse settings. Our code is available at https://github.com/LiBingyu01/U3M-multimodal-semantic-segmentation.",

keywords = "Multi-modality, Multi-scale fusion, Semantic segmentation, Unbiased modality fusion",

author = "Bingyu Li and Da Zhang and Zhiyuan Zhao and Junyu Gao and Xuelong Li",

note = "Publisher Copyright: {\textcopyright} 2025",

year = "2025",

month = dec,

doi = "10.1016/j.patcog.2025.111801",

language = "英语",

volume = "168",

journal = "Pattern Recognition",

issn = "0031-3203",

publisher = "Elsevier Ltd",

}

TY - JOUR

T1 - U3M

T2 - Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation

AU - Li, Bingyu

AU - Zhang, Da

AU - Zhao, Zhiyuan

AU - Gao, Junyu

AU - Li, Xuelong

PY - 2025/12

Y1 - 2025/12

N2 - Multimodal Semantic Segmentation is a pivotal component of the transportation system and typically surpasses unimodal methods by utilizing rich information sets from various sources. Current models frequently adopt modality-specific frameworks that are inherently biased toward certain modalities. Although these biases might be advantageous in specific situations, they generally limit the adaptability of the models across different multimodal contexts, thereby potentially impairing performance. To address this issue, we leverage the inherent capabilities of the model itself to discover the optimal equilibrium in multimodal fusion and introduce U3M: An Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation. Specifically, this method involves an unbiased integration of multimodal visual data. Additionally, we employ feature fusion at multiple scales to ensure the effective extraction and integration of both global and local features. Experimental results demonstrate that our approach achieves superior performance across multiple datasets, verifying its efficacy in enhancing the robustness and versatility of semantic segmentation in diverse settings. Our code is available at https://github.com/LiBingyu01/U3M-multimodal-semantic-segmentation.

AB - Multimodal Semantic Segmentation is a pivotal component of the transportation system and typically surpasses unimodal methods by utilizing rich information sets from various sources. Current models frequently adopt modality-specific frameworks that are inherently biased toward certain modalities. Although these biases might be advantageous in specific situations, they generally limit the adaptability of the models across different multimodal contexts, thereby potentially impairing performance. To address this issue, we leverage the inherent capabilities of the model itself to discover the optimal equilibrium in multimodal fusion and introduce U3M: An Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation. Specifically, this method involves an unbiased integration of multimodal visual data. Additionally, we employ feature fusion at multiple scales to ensure the effective extraction and integration of both global and local features. Experimental results demonstrate that our approach achieves superior performance across multiple datasets, verifying its efficacy in enhancing the robustness and versatility of semantic segmentation in diverse settings. Our code is available at https://github.com/LiBingyu01/U3M-multimodal-semantic-segmentation.

KW - Multi-modality

KW - Multi-scale fusion

KW - Semantic segmentation

KW - Unbiased modality fusion

UR - http://www.scopus.com/inward/record.url?scp=105005867804&partnerID=8YFLogxK

U2 - 10.1016/j.patcog.2025.111801

DO - 10.1016/j.patcog.2025.111801

M3 - 文章

AN - SCOPUS:105005867804

SN - 0031-3203

VL - 168

JO - Pattern Recognition

JF - Pattern Recognition

M1 - 111801

ER -

U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation

Abstract

Keywords

Access to Document

Other files and links

Cite this