CrossDiff: Exploring Self-SupervisedRepresentation of Pansharpening via Cross-Predictive Diffusion Model

Yinghui Xing; Litao Qu; Shizhou Zhang; Kai Zhang; Yanning Zhang; Lorenzo Bruzzone

doi:10.1109/TIP.2024.3461476

CrossDiff: Exploring Self-SupervisedRepresentation of Pansharpening via Cross-Predictive Diffusion Model

Yinghui Xing, Litao Qu, Shizhou Zhang, Kai Zhang, Yanning Zhang, Lorenzo Bruzzone

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

3 引用（Scopus）

摘要

Fusion of a panchromatic (PAN) image and corresponding multispectral (MS) image is also known as pansharpening, which aims to combine abundant spatial details of PAN and spectral information of MS images. Due to the absence of high-resolution MS images, available deep-learning-based methods usually follow the paradigm of training at reduced resolution and testing at both reduced and full resolution. When taking original MS and PAN images as inputs, they always obtain sub-optimal results due to the scale variation. In this paper, we propose to explore the self-supervised representation for pansharpening by designing a cross-predictive diffusion model, named CrossDiff. It has two-stage training. In the first stage, we introduce a cross-predictive pretext task to pre-train the UNet structure based on conditional Denoising Diffusion Probabilistic Model (DDPM). While in the second stage, the encoders of the UNets are frozen to directly extract spatial and spectral features from PAN and MS images, and only the fusion head is trained to adapt for pansharpening task. Extensive experiments show the effectiveness and superiority of the proposed model compared with state-of-the-art supervised and unsupervised methods. Besides, the cross-sensor experiments also verify the generalization ability of proposed self-supervised representation learners for other satellite datasets. Code is available at https://github.com/codgodtao/CrossDiff.

源语言	英语
页（从-至）	5496-5509
页数	14
期刊	IEEE Transactions on Image Processing
卷	33
DOI	https://doi.org/10.1109/TIP.2024.3461476
出版状态	已出版 - 2024

访问文件

10.1109/TIP.2024.3461476

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{c6b9325414664f408f492c532c89903d,

title = "CrossDiff: Exploring Self-SupervisedRepresentation of Pansharpening via Cross-Predictive Diffusion Model",

abstract = "Fusion of a panchromatic (PAN) image and corresponding multispectral (MS) image is also known as pansharpening, which aims to combine abundant spatial details of PAN and spectral information of MS images. Due to the absence of high-resolution MS images, available deep-learning-based methods usually follow the paradigm of training at reduced resolution and testing at both reduced and full resolution. When taking original MS and PAN images as inputs, they always obtain sub-optimal results due to the scale variation. In this paper, we propose to explore the self-supervised representation for pansharpening by designing a cross-predictive diffusion model, named CrossDiff. It has two-stage training. In the first stage, we introduce a cross-predictive pretext task to pre-train the UNet structure based on conditional Denoising Diffusion Probabilistic Model (DDPM). While in the second stage, the encoders of the UNets are frozen to directly extract spatial and spectral features from PAN and MS images, and only the fusion head is trained to adapt for pansharpening task. Extensive experiments show the effectiveness and superiority of the proposed model compared with state-of-the-art supervised and unsupervised methods. Besides, the cross-sensor experiments also verify the generalization ability of proposed self-supervised representation learners for other satellite datasets. Code is available at https://github.com/codgodtao/CrossDiff.",

keywords = "denoising diffusion probabilistic model, Image fusion, pansharpening, self-supervised learning",

author = "Yinghui Xing and Litao Qu and Shizhou Zhang and Kai Zhang and Yanning Zhang and Lorenzo Bruzzone",

note = "Publisher Copyright: {\textcopyright} 1992-2012 IEEE.",

year = "2024",

doi = "10.1109/TIP.2024.3461476",

language = "英语",

volume = "33",

pages = "5496--5509",

journal = "IEEE Transactions on Image Processing",

issn = "1057-7149",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - CrossDiff

T2 - Exploring Self-SupervisedRepresentation of Pansharpening via Cross-Predictive Diffusion Model

AU - Xing, Yinghui

AU - Qu, Litao

AU - Zhang, Shizhou

AU - Zhang, Kai

AU - Zhang, Yanning

AU - Bruzzone, Lorenzo

PY - 2024

Y1 - 2024

N2 - Fusion of a panchromatic (PAN) image and corresponding multispectral (MS) image is also known as pansharpening, which aims to combine abundant spatial details of PAN and spectral information of MS images. Due to the absence of high-resolution MS images, available deep-learning-based methods usually follow the paradigm of training at reduced resolution and testing at both reduced and full resolution. When taking original MS and PAN images as inputs, they always obtain sub-optimal results due to the scale variation. In this paper, we propose to explore the self-supervised representation for pansharpening by designing a cross-predictive diffusion model, named CrossDiff. It has two-stage training. In the first stage, we introduce a cross-predictive pretext task to pre-train the UNet structure based on conditional Denoising Diffusion Probabilistic Model (DDPM). While in the second stage, the encoders of the UNets are frozen to directly extract spatial and spectral features from PAN and MS images, and only the fusion head is trained to adapt for pansharpening task. Extensive experiments show the effectiveness and superiority of the proposed model compared with state-of-the-art supervised and unsupervised methods. Besides, the cross-sensor experiments also verify the generalization ability of proposed self-supervised representation learners for other satellite datasets. Code is available at https://github.com/codgodtao/CrossDiff.

AB - Fusion of a panchromatic (PAN) image and corresponding multispectral (MS) image is also known as pansharpening, which aims to combine abundant spatial details of PAN and spectral information of MS images. Due to the absence of high-resolution MS images, available deep-learning-based methods usually follow the paradigm of training at reduced resolution and testing at both reduced and full resolution. When taking original MS and PAN images as inputs, they always obtain sub-optimal results due to the scale variation. In this paper, we propose to explore the self-supervised representation for pansharpening by designing a cross-predictive diffusion model, named CrossDiff. It has two-stage training. In the first stage, we introduce a cross-predictive pretext task to pre-train the UNet structure based on conditional Denoising Diffusion Probabilistic Model (DDPM). While in the second stage, the encoders of the UNets are frozen to directly extract spatial and spectral features from PAN and MS images, and only the fusion head is trained to adapt for pansharpening task. Extensive experiments show the effectiveness and superiority of the proposed model compared with state-of-the-art supervised and unsupervised methods. Besides, the cross-sensor experiments also verify the generalization ability of proposed self-supervised representation learners for other satellite datasets. Code is available at https://github.com/codgodtao/CrossDiff.

KW - denoising diffusion probabilistic model

KW - Image fusion

KW - pansharpening

KW - self-supervised learning

UR - http://www.scopus.com/inward/record.url?scp=85204695753&partnerID=8YFLogxK

U2 - 10.1109/TIP.2024.3461476

DO - 10.1109/TIP.2024.3461476

M3 - 文章

C2 - 39302803

AN - SCOPUS:85204695753

SN - 1057-7149

VL - 33

SP - 5496

EP - 5509

JO - IEEE Transactions on Image Processing

JF - IEEE Transactions on Image Processing

ER -

CrossDiff: Exploring Self-SupervisedRepresentation of Pansharpening via Cross-Predictive Diffusion Model

摘要

访问文件

其它文件与链接

指纹

引用此