Deep Unfolding Multi-Modal Image Fusion Network via Attribution Analysis

Haowen Bai; Zixiang Zhao; Jiangshe Zhang; Baisong Jiang; Lilun Deng; Yukun Cui; Shuang Xu; Chunxia Zhang

doi:10.1109/TCSVT.2024.3507540

Deep Unfolding Multi-Modal Image Fusion Network via Attribution Analysis

Haowen Bai, Zixiang Zhao, Jiangshe Zhang, Baisong Jiang, Lilun Deng, Yukun Cui, Shuang Xu, Chunxia Zhang

数学与统计学院

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

Multi-modal image fusion synthesizes information from multiple sources into a single image, facilitating downstream tasks such as semantic segmentation. Current approaches primarily focus on acquiring informative fusion images at the visual display stratum through intricate mappings. Although some approaches attempt to jointly optimize image fusion and downstream tasks, these efforts often lack direct guidance or interaction, serving only to assist with a predefined fusion loss. To address this, we propose an “Unfolding Attribution Analysis Fusion network” (UAAFusion), using attribution analysis to tailor fused images more effectively for semantic segmentation, enhancing the interaction between the fusion and segmentation. Specifically, we utilize attribution analysis techniques to explore the contributions of semantic regions in the source images to task discrimination. At the same time, our fusion algorithm incorporates more beneficial features from the source images, thereby allowing the segmentation to guide the fusion process. Our method constructs a model-driven unfolding network that uses optimization objectives derived from attribution analysis, with an attribution fusion loss calculated from the current state of the segmentation network. We also develop a new pathway function for attribution analysis, specifically tailored to the fusion tasks in our unfolding network. An attribution attention mechanism is integrated at each network stage, allowing the fusion network to prioritize areas and pixels crucial for high-level recognition tasks. Additionally, to mitigate the information loss in traditional unfolding networks, a memory augmentation module is incorporated into our network to improve the information flow across various network layers. Extensive experiments demonstrate our method’s superiority in image fusion and applicability to semantic segmentation. The code is available at https://github.com/HaowenBai/UAAFusion.

源语言	英语
页（从-至）	3498-3511
页数	14
期刊	IEEE Transactions on Circuits and Systems for Video Technology
卷	35
期	4
DOI	https://doi.org/10.1109/TCSVT.2024.3507540
出版状态	已出版 - 2025

访问文件

10.1109/TCSVT.2024.3507540

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{1f9fff2026c74a8680d14b54ccef77b0,

title = "Deep Unfolding Multi-Modal Image Fusion Network via Attribution Analysis",

abstract = "Multi-modal image fusion synthesizes information from multiple sources into a single image, facilitating downstream tasks such as semantic segmentation. Current approaches primarily focus on acquiring informative fusion images at the visual display stratum through intricate mappings. Although some approaches attempt to jointly optimize image fusion and downstream tasks, these efforts often lack direct guidance or interaction, serving only to assist with a predefined fusion loss. To address this, we propose an “Unfolding Attribution Analysis Fusion network” (UAAFusion), using attribution analysis to tailor fused images more effectively for semantic segmentation, enhancing the interaction between the fusion and segmentation. Specifically, we utilize attribution analysis techniques to explore the contributions of semantic regions in the source images to task discrimination. At the same time, our fusion algorithm incorporates more beneficial features from the source images, thereby allowing the segmentation to guide the fusion process. Our method constructs a model-driven unfolding network that uses optimization objectives derived from attribution analysis, with an attribution fusion loss calculated from the current state of the segmentation network. We also develop a new pathway function for attribution analysis, specifically tailored to the fusion tasks in our unfolding network. An attribution attention mechanism is integrated at each network stage, allowing the fusion network to prioritize areas and pixels crucial for high-level recognition tasks. Additionally, to mitigate the information loss in traditional unfolding networks, a memory augmentation module is incorporated into our network to improve the information flow across various network layers. Extensive experiments demonstrate our method{\textquoteright}s superiority in image fusion and applicability to semantic segmentation. The code is available at https://github.com/HaowenBai/UAAFusion.",

keywords = "algorithm unfolding, attribution analysis, memory augmentation, Multi-modal image fusion",

author = "Haowen Bai and Zixiang Zhao and Jiangshe Zhang and Baisong Jiang and Lilun Deng and Yukun Cui and Shuang Xu and Chunxia Zhang",

note = "Publisher Copyright: {\textcopyright} 1991-2012 IEEE.",

year = "2025",

doi = "10.1109/TCSVT.2024.3507540",

language = "英语",

volume = "35",

pages = "3498--3511",

journal = "IEEE Transactions on Circuits and Systems for Video Technology",

issn = "1051-8215",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "4",

}

TY - JOUR

T1 - Deep Unfolding Multi-Modal Image Fusion Network via Attribution Analysis

AU - Bai, Haowen

AU - Zhao, Zixiang

AU - Zhang, Jiangshe

AU - Jiang, Baisong

AU - Deng, Lilun

AU - Cui, Yukun

AU - Xu, Shuang

AU - Zhang, Chunxia

PY - 2025

Y1 - 2025

N2 - Multi-modal image fusion synthesizes information from multiple sources into a single image, facilitating downstream tasks such as semantic segmentation. Current approaches primarily focus on acquiring informative fusion images at the visual display stratum through intricate mappings. Although some approaches attempt to jointly optimize image fusion and downstream tasks, these efforts often lack direct guidance or interaction, serving only to assist with a predefined fusion loss. To address this, we propose an “Unfolding Attribution Analysis Fusion network” (UAAFusion), using attribution analysis to tailor fused images more effectively for semantic segmentation, enhancing the interaction between the fusion and segmentation. Specifically, we utilize attribution analysis techniques to explore the contributions of semantic regions in the source images to task discrimination. At the same time, our fusion algorithm incorporates more beneficial features from the source images, thereby allowing the segmentation to guide the fusion process. Our method constructs a model-driven unfolding network that uses optimization objectives derived from attribution analysis, with an attribution fusion loss calculated from the current state of the segmentation network. We also develop a new pathway function for attribution analysis, specifically tailored to the fusion tasks in our unfolding network. An attribution attention mechanism is integrated at each network stage, allowing the fusion network to prioritize areas and pixels crucial for high-level recognition tasks. Additionally, to mitigate the information loss in traditional unfolding networks, a memory augmentation module is incorporated into our network to improve the information flow across various network layers. Extensive experiments demonstrate our method’s superiority in image fusion and applicability to semantic segmentation. The code is available at https://github.com/HaowenBai/UAAFusion.

AB - Multi-modal image fusion synthesizes information from multiple sources into a single image, facilitating downstream tasks such as semantic segmentation. Current approaches primarily focus on acquiring informative fusion images at the visual display stratum through intricate mappings. Although some approaches attempt to jointly optimize image fusion and downstream tasks, these efforts often lack direct guidance or interaction, serving only to assist with a predefined fusion loss. To address this, we propose an “Unfolding Attribution Analysis Fusion network” (UAAFusion), using attribution analysis to tailor fused images more effectively for semantic segmentation, enhancing the interaction between the fusion and segmentation. Specifically, we utilize attribution analysis techniques to explore the contributions of semantic regions in the source images to task discrimination. At the same time, our fusion algorithm incorporates more beneficial features from the source images, thereby allowing the segmentation to guide the fusion process. Our method constructs a model-driven unfolding network that uses optimization objectives derived from attribution analysis, with an attribution fusion loss calculated from the current state of the segmentation network. We also develop a new pathway function for attribution analysis, specifically tailored to the fusion tasks in our unfolding network. An attribution attention mechanism is integrated at each network stage, allowing the fusion network to prioritize areas and pixels crucial for high-level recognition tasks. Additionally, to mitigate the information loss in traditional unfolding networks, a memory augmentation module is incorporated into our network to improve the information flow across various network layers. Extensive experiments demonstrate our method’s superiority in image fusion and applicability to semantic segmentation. The code is available at https://github.com/HaowenBai/UAAFusion.

KW - algorithm unfolding

KW - attribution analysis

KW - memory augmentation

KW - Multi-modal image fusion

UR - http://www.scopus.com/inward/record.url?scp=105002303699&partnerID=8YFLogxK

U2 - 10.1109/TCSVT.2024.3507540

DO - 10.1109/TCSVT.2024.3507540

M3 - 文章

AN - SCOPUS:105002303699

SN - 1051-8215

VL - 35

SP - 3498

EP - 3511

JO - IEEE Transactions on Circuits and Systems for Video Technology

JF - IEEE Transactions on Circuits and Systems for Video Technology

IS - 4

ER -

Deep Unfolding Multi-Modal Image Fusion Network via Attribution Analysis

摘要

访问文件

其它文件与链接

指纹

引用此