Modal Feature Disentanglement and Contribution Estimation for Multi-Modality Image Fusion

Tao Zhang; Xiaogang Yang; Ruitao Lu; Dingwen Zhang; Xueli Xie; Zhengjie Zhu

doi:10.1109/TIM.2025.3545534

Modal Feature Disentanglement and Contribution Estimation for Multi-Modality Image Fusion

Tao Zhang, Xiaogang Yang, Ruitao Lu, Dingwen Zhang, Xueli Xie, Zhengjie Zhu

自动化学院

Rocket Force University of Engineering

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

Multi-modality image fusion task (MMIF) aims at fusing complementary information from different modalities, e.g., salient objects and texture details, to improve image quality and information comprehensiveness. Most current MMIF methods adopt a 'black-box' decoder to generate fused images, which lead to insufficient interpretability and difficulty in training. To deal with these problems, we convert MMIF into a modality contribution estimation task, and propose a novel self-supervised fusion network based on modal feature disentanglement and contribution estimation, named MFDCE-Fuse. First, we construct a contrast-learning auto-encoder, which seamlessly integrates the strengths of CNN and Swin Transformer to capture the long-range global features and local texture details, and designs the contrastive reconstruction loss to promote the uniqueness and non-redundancy of the captured features. Second, considering that modal redundant features interfere with modal contribution estimation, we propose a feature disentangled representation framework based on contrastive constraint for obtaining modal-common and modal-private features. And the contribution of modal images to the MMIF is evaluated through the proportion of modal-private features, which enhances the interpretability of the fusion process and image quality of the fused image. Furthermore, an innovative weighted perceptual loss and feature disentanglement contrastive loss are constructed to guarantee that the private feature remains intact. Qualitative and quantitative experimental results demonstrate the applicability and generalization of MFDCE-Fuse across multiple fusion tasks involving visible-infrared and medical image fusion.

源语言	英语
期刊	IEEE Transactions on Instrumentation and Measurement
DOI	https://doi.org/10.1109/TIM.2025.3545534
出版状态	已接受/待刊 - 2025

访问文件

10.1109/TIM.2025.3545534

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{326c545d10214212a19dcd4b155d6f1c,

title = "Modal Feature Disentanglement and Contribution Estimation for Multi-Modality Image Fusion",

abstract = "Multi-modality image fusion task (MMIF) aims at fusing complementary information from different modalities, e.g., salient objects and texture details, to improve image quality and information comprehensiveness. Most current MMIF methods adopt a 'black-box' decoder to generate fused images, which lead to insufficient interpretability and difficulty in training. To deal with these problems, we convert MMIF into a modality contribution estimation task, and propose a novel self-supervised fusion network based on modal feature disentanglement and contribution estimation, named MFDCE-Fuse. First, we construct a contrast-learning auto-encoder, which seamlessly integrates the strengths of CNN and Swin Transformer to capture the long-range global features and local texture details, and designs the contrastive reconstruction loss to promote the uniqueness and non-redundancy of the captured features. Second, considering that modal redundant features interfere with modal contribution estimation, we propose a feature disentangled representation framework based on contrastive constraint for obtaining modal-common and modal-private features. And the contribution of modal images to the MMIF is evaluated through the proportion of modal-private features, which enhances the interpretability of the fusion process and image quality of the fused image. Furthermore, an innovative weighted perceptual loss and feature disentanglement contrastive loss are constructed to guarantee that the private feature remains intact. Qualitative and quantitative experimental results demonstrate the applicability and generalization of MFDCE-Fuse across multiple fusion tasks involving visible-infrared and medical image fusion.",

keywords = "Contrastive learning, contribution estimation, disentangled representation, feature disentanglement, image fusion",

author = "Tao Zhang and Xiaogang Yang and Ruitao Lu and Dingwen Zhang and Xueli Xie and Zhengjie Zhu",

note = "Publisher Copyright: {\textcopyright} 1963-2012 IEEE.",

year = "2025",

doi = "10.1109/TIM.2025.3545534",

language = "英语",

journal = "IEEE Transactions on Instrumentation and Measurement",

issn = "0018-9456",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Modal Feature Disentanglement and Contribution Estimation for Multi-Modality Image Fusion

AU - Zhang, Tao

AU - Yang, Xiaogang

AU - Lu, Ruitao

AU - Zhang, Dingwen

AU - Xie, Xueli

AU - Zhu, Zhengjie

PY - 2025

Y1 - 2025

N2 - Multi-modality image fusion task (MMIF) aims at fusing complementary information from different modalities, e.g., salient objects and texture details, to improve image quality and information comprehensiveness. Most current MMIF methods adopt a 'black-box' decoder to generate fused images, which lead to insufficient interpretability and difficulty in training. To deal with these problems, we convert MMIF into a modality contribution estimation task, and propose a novel self-supervised fusion network based on modal feature disentanglement and contribution estimation, named MFDCE-Fuse. First, we construct a contrast-learning auto-encoder, which seamlessly integrates the strengths of CNN and Swin Transformer to capture the long-range global features and local texture details, and designs the contrastive reconstruction loss to promote the uniqueness and non-redundancy of the captured features. Second, considering that modal redundant features interfere with modal contribution estimation, we propose a feature disentangled representation framework based on contrastive constraint for obtaining modal-common and modal-private features. And the contribution of modal images to the MMIF is evaluated through the proportion of modal-private features, which enhances the interpretability of the fusion process and image quality of the fused image. Furthermore, an innovative weighted perceptual loss and feature disentanglement contrastive loss are constructed to guarantee that the private feature remains intact. Qualitative and quantitative experimental results demonstrate the applicability and generalization of MFDCE-Fuse across multiple fusion tasks involving visible-infrared and medical image fusion.

AB - Multi-modality image fusion task (MMIF) aims at fusing complementary information from different modalities, e.g., salient objects and texture details, to improve image quality and information comprehensiveness. Most current MMIF methods adopt a 'black-box' decoder to generate fused images, which lead to insufficient interpretability and difficulty in training. To deal with these problems, we convert MMIF into a modality contribution estimation task, and propose a novel self-supervised fusion network based on modal feature disentanglement and contribution estimation, named MFDCE-Fuse. First, we construct a contrast-learning auto-encoder, which seamlessly integrates the strengths of CNN and Swin Transformer to capture the long-range global features and local texture details, and designs the contrastive reconstruction loss to promote the uniqueness and non-redundancy of the captured features. Second, considering that modal redundant features interfere with modal contribution estimation, we propose a feature disentangled representation framework based on contrastive constraint for obtaining modal-common and modal-private features. And the contribution of modal images to the MMIF is evaluated through the proportion of modal-private features, which enhances the interpretability of the fusion process and image quality of the fused image. Furthermore, an innovative weighted perceptual loss and feature disentanglement contrastive loss are constructed to guarantee that the private feature remains intact. Qualitative and quantitative experimental results demonstrate the applicability and generalization of MFDCE-Fuse across multiple fusion tasks involving visible-infrared and medical image fusion.

KW - Contrastive learning

KW - contribution estimation

KW - disentangled representation

KW - feature disentanglement

KW - image fusion

UR - http://www.scopus.com/inward/record.url?scp=86000774407&partnerID=8YFLogxK

U2 - 10.1109/TIM.2025.3545534

DO - 10.1109/TIM.2025.3545534

M3 - 文章

AN - SCOPUS:86000774407

SN - 0018-9456

JO - IEEE Transactions on Instrumentation and Measurement

JF - IEEE Transactions on Instrumentation and Measurement

ER -

Modal Feature Disentanglement and Contribution Estimation for Multi-Modality Image Fusion

摘要

访问文件

其它文件与链接

指纹

引用此