U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation

Bingyu Li; Da Zhang; Zhiyuan Zhao; Junyu Gao; Xuelong Li

doi:10.1016/j.patcog.2025.111801

U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation

Bingyu Li, Da Zhang, Zhiyuan Zhao, Junyu Gao, Xuelong Li

光电与智能研究院

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

Multimodal Semantic Segmentation is a pivotal component of the transportation system and typically surpasses unimodal methods by utilizing rich information sets from various sources. Current models frequently adopt modality-specific frameworks that are inherently biased toward certain modalities. Although these biases might be advantageous in specific situations, they generally limit the adaptability of the models across different multimodal contexts, thereby potentially impairing performance. To address this issue, we leverage the inherent capabilities of the model itself to discover the optimal equilibrium in multimodal fusion and introduce U3M: An Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation. Specifically, this method involves an unbiased integration of multimodal visual data. Additionally, we employ feature fusion at multiple scales to ensure the effective extraction and integration of both global and local features. Experimental results demonstrate that our approach achieves superior performance across multiple datasets, verifying its efficacy in enhancing the robustness and versatility of semantic segmentation in diverse settings. Our code is available at https://github.com/LiBingyu01/U3M-multimodal-semantic-segmentation.

源语言	英语
文章编号	111801
期刊	Pattern Recognition
卷	168
DOI	https://doi.org/10.1016/j.patcog.2025.111801
出版状态	已出版 - 12月 2025

访问文件

10.1016/j.patcog.2025.111801

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{e6f7d5777246446bacb970bc81b25986,

title = "U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation",

abstract = "Multimodal Semantic Segmentation is a pivotal component of the transportation system and typically surpasses unimodal methods by utilizing rich information sets from various sources. Current models frequently adopt modality-specific frameworks that are inherently biased toward certain modalities. Although these biases might be advantageous in specific situations, they generally limit the adaptability of the models across different multimodal contexts, thereby potentially impairing performance. To address this issue, we leverage the inherent capabilities of the model itself to discover the optimal equilibrium in multimodal fusion and introduce U3M: An Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation. Specifically, this method involves an unbiased integration of multimodal visual data. Additionally, we employ feature fusion at multiple scales to ensure the effective extraction and integration of both global and local features. Experimental results demonstrate that our approach achieves superior performance across multiple datasets, verifying its efficacy in enhancing the robustness and versatility of semantic segmentation in diverse settings. Our code is available at https://github.com/LiBingyu01/U3M-multimodal-semantic-segmentation.",

keywords = "Multi-modality, Multi-scale fusion, Semantic segmentation, Unbiased modality fusion",

author = "Bingyu Li and Da Zhang and Zhiyuan Zhao and Junyu Gao and Xuelong Li",

note = "Publisher Copyright: {\textcopyright} 2025",

year = "2025",

month = dec,

doi = "10.1016/j.patcog.2025.111801",

language = "英语",

volume = "168",

journal = "Pattern Recognition",

issn = "0031-3203",

publisher = "Elsevier Ltd",

}

TY - JOUR

T1 - U3M

T2 - Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation

AU - Li, Bingyu

AU - Zhang, Da

AU - Zhao, Zhiyuan

AU - Gao, Junyu

AU - Li, Xuelong

PY - 2025/12

Y1 - 2025/12

N2 - Multimodal Semantic Segmentation is a pivotal component of the transportation system and typically surpasses unimodal methods by utilizing rich information sets from various sources. Current models frequently adopt modality-specific frameworks that are inherently biased toward certain modalities. Although these biases might be advantageous in specific situations, they generally limit the adaptability of the models across different multimodal contexts, thereby potentially impairing performance. To address this issue, we leverage the inherent capabilities of the model itself to discover the optimal equilibrium in multimodal fusion and introduce U3M: An Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation. Specifically, this method involves an unbiased integration of multimodal visual data. Additionally, we employ feature fusion at multiple scales to ensure the effective extraction and integration of both global and local features. Experimental results demonstrate that our approach achieves superior performance across multiple datasets, verifying its efficacy in enhancing the robustness and versatility of semantic segmentation in diverse settings. Our code is available at https://github.com/LiBingyu01/U3M-multimodal-semantic-segmentation.

AB - Multimodal Semantic Segmentation is a pivotal component of the transportation system and typically surpasses unimodal methods by utilizing rich information sets from various sources. Current models frequently adopt modality-specific frameworks that are inherently biased toward certain modalities. Although these biases might be advantageous in specific situations, they generally limit the adaptability of the models across different multimodal contexts, thereby potentially impairing performance. To address this issue, we leverage the inherent capabilities of the model itself to discover the optimal equilibrium in multimodal fusion and introduce U3M: An Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation. Specifically, this method involves an unbiased integration of multimodal visual data. Additionally, we employ feature fusion at multiple scales to ensure the effective extraction and integration of both global and local features. Experimental results demonstrate that our approach achieves superior performance across multiple datasets, verifying its efficacy in enhancing the robustness and versatility of semantic segmentation in diverse settings. Our code is available at https://github.com/LiBingyu01/U3M-multimodal-semantic-segmentation.

KW - Multi-modality

KW - Multi-scale fusion

KW - Semantic segmentation

KW - Unbiased modality fusion

UR - http://www.scopus.com/inward/record.url?scp=105005867804&partnerID=8YFLogxK

U2 - 10.1016/j.patcog.2025.111801

DO - 10.1016/j.patcog.2025.111801

M3 - 文章

AN - SCOPUS:105005867804

SN - 0031-3203

VL - 168

JO - Pattern Recognition

JF - Pattern Recognition

M1 - 111801

ER -