Equivariant Multi-Modality Image Fusion

Zixiang Zhao; Haowen Bai; Jiangshe Zhang; Yulun Zhang; Kai Zhang; Shuang Xu; Dongdong Chen; Radu Timofte; Luc Van Gool

doi:10.1109/CVPR52733.2024.02448

Equivariant Multi-Modality Image Fusion

Zixiang Zhao, Haowen Bai, Jiangshe Zhang, Yulun Zhang, Kai Zhang, Shuang Xu, Dongdong Chen, Radu Timofte, Luc Van Gool

数学与统计学院

科研成果: 期刊稿件 › 会议文章 › 同行评审

43 引用（Scopus）

摘要

Multi-modality image fusion is a technique that combines information from different sensors or modalities, en-abling the fused image to retain complementary features from each modality, such as functional highlights and texture details. However, effective training of such fusion models is challenging due to the scarcity of ground truth fusion data. To tackle this issue, we propose the Equivariant Multi-Modality imAge fusion (EMMA) paradigm for end-to-end self-supervised learning. Our approach is rooted in the prior knowledge that natural imaging responses are equivariant to certain transformations. Consequently, we introduce a novel training paradigm that encompasses a fusion module, a pseudo-sensing module, and an equivariant fusion module. These components enable the net training to follow the principles of the natural sensing-imaging process while satisfying the equivariant imaging prior. Extensive experiments confirm that EMMA yields high-quality fusion results for infraredvisible and medical images, concurrently facilitating downstream multi-modal segmentation and detection tasks.

源语言	英语
页（从-至）	25912-25921
页数	10
期刊	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
DOI	https://doi.org/10.1109/CVPR52733.2024.02448
出版状态	已出版 - 2024
活动	2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024 - Seattle, 美国期限: 16 6月 2024 → 22 6月 2024

访问文件

10.1109/CVPR52733.2024.02448

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{895fb9c4234041748006e21014cc3ce8,

title = "Equivariant Multi-Modality Image Fusion",

abstract = "Multi-modality image fusion is a technique that combines information from different sensors or modalities, en-abling the fused image to retain complementary features from each modality, such as functional highlights and texture details. However, effective training of such fusion models is challenging due to the scarcity of ground truth fusion data. To tackle this issue, we propose the Equivariant Multi-Modality imAge fusion (EMMA) paradigm for end-to-end self-supervised learning. Our approach is rooted in the prior knowledge that natural imaging responses are equivariant to certain transformations. Consequently, we introduce a novel training paradigm that encompasses a fusion module, a pseudo-sensing module, and an equivariant fusion module. These components enable the net training to follow the principles of the natural sensing-imaging process while satisfying the equivariant imaging prior. Extensive experiments confirm that EMMA yields high-quality fusion results for infraredvisible and medical images, concurrently facilitating downstream multi-modal segmentation and detection tasks.",

keywords = "image fusion, low-level vision",

author = "Zixiang Zhao and Haowen Bai and Jiangshe Zhang and Yulun Zhang and Kai Zhang and Shuang Xu and Dongdong Chen and Radu Timofte and {Van Gool}, Luc",

note = "Publisher Copyright: {\textcopyright} 2024 IEEE.; 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024 ; Conference date: 16-06-2024 Through 22-06-2024",

year = "2024",

doi = "10.1109/CVPR52733.2024.02448",

language = "英语",

pages = "25912--25921",

journal = "Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition",

issn = "1063-6919",

publisher = "IEEE Computer Society",

}

TY - JOUR

T1 - Equivariant Multi-Modality Image Fusion

AU - Zhao, Zixiang

AU - Bai, Haowen

AU - Zhang, Jiangshe

AU - Zhang, Yulun

AU - Zhang, Kai

AU - Xu, Shuang

AU - Chen, Dongdong

AU - Timofte, Radu

AU - Van Gool, Luc

PY - 2024

Y1 - 2024

N2 - Multi-modality image fusion is a technique that combines information from different sensors or modalities, en-abling the fused image to retain complementary features from each modality, such as functional highlights and texture details. However, effective training of such fusion models is challenging due to the scarcity of ground truth fusion data. To tackle this issue, we propose the Equivariant Multi-Modality imAge fusion (EMMA) paradigm for end-to-end self-supervised learning. Our approach is rooted in the prior knowledge that natural imaging responses are equivariant to certain transformations. Consequently, we introduce a novel training paradigm that encompasses a fusion module, a pseudo-sensing module, and an equivariant fusion module. These components enable the net training to follow the principles of the natural sensing-imaging process while satisfying the equivariant imaging prior. Extensive experiments confirm that EMMA yields high-quality fusion results for infraredvisible and medical images, concurrently facilitating downstream multi-modal segmentation and detection tasks.

AB - Multi-modality image fusion is a technique that combines information from different sensors or modalities, en-abling the fused image to retain complementary features from each modality, such as functional highlights and texture details. However, effective training of such fusion models is challenging due to the scarcity of ground truth fusion data. To tackle this issue, we propose the Equivariant Multi-Modality imAge fusion (EMMA) paradigm for end-to-end self-supervised learning. Our approach is rooted in the prior knowledge that natural imaging responses are equivariant to certain transformations. Consequently, we introduce a novel training paradigm that encompasses a fusion module, a pseudo-sensing module, and an equivariant fusion module. These components enable the net training to follow the principles of the natural sensing-imaging process while satisfying the equivariant imaging prior. Extensive experiments confirm that EMMA yields high-quality fusion results for infraredvisible and medical images, concurrently facilitating downstream multi-modal segmentation and detection tasks.

KW - image fusion

KW - low-level vision

UR - http://www.scopus.com/inward/record.url?scp=85218046661&partnerID=8YFLogxK

U2 - 10.1109/CVPR52733.2024.02448

DO - 10.1109/CVPR52733.2024.02448

M3 - 会议文章

AN - SCOPUS:85218046661

SN - 1063-6919

SP - 25912

EP - 25921

JO - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

JF - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

T2 - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024

Y2 - 16 June 2024 through 22 June 2024

ER -

Equivariant Multi-Modality Image Fusion

摘要

访问文件

其它文件与链接

指纹

引用此