TY - JOUR
T1 - CrossDiff
T2 - Exploring Self-SupervisedRepresentation of Pansharpening via Cross-Predictive Diffusion Model
AU - Xing, Yinghui
AU - Qu, Litao
AU - Zhang, Shizhou
AU - Zhang, Kai
AU - Zhang, Yanning
AU - Bruzzone, Lorenzo
N1 - Publisher Copyright:
© 1992-2012 IEEE.
PY - 2024
Y1 - 2024
N2 - Fusion of a panchromatic (PAN) image and corresponding multispectral (MS) image is also known as pansharpening, which aims to combine abundant spatial details of PAN and spectral information of MS images. Due to the absence of high-resolution MS images, available deep-learning-based methods usually follow the paradigm of training at reduced resolution and testing at both reduced and full resolution. When taking original MS and PAN images as inputs, they always obtain sub-optimal results due to the scale variation. In this paper, we propose to explore the self-supervised representation for pansharpening by designing a cross-predictive diffusion model, named CrossDiff. It has two-stage training. In the first stage, we introduce a cross-predictive pretext task to pre-train the UNet structure based on conditional Denoising Diffusion Probabilistic Model (DDPM). While in the second stage, the encoders of the UNets are frozen to directly extract spatial and spectral features from PAN and MS images, and only the fusion head is trained to adapt for pansharpening task. Extensive experiments show the effectiveness and superiority of the proposed model compared with state-of-the-art supervised and unsupervised methods. Besides, the cross-sensor experiments also verify the generalization ability of proposed self-supervised representation learners for other satellite datasets. Code is available at https://github.com/codgodtao/CrossDiff.
AB - Fusion of a panchromatic (PAN) image and corresponding multispectral (MS) image is also known as pansharpening, which aims to combine abundant spatial details of PAN and spectral information of MS images. Due to the absence of high-resolution MS images, available deep-learning-based methods usually follow the paradigm of training at reduced resolution and testing at both reduced and full resolution. When taking original MS and PAN images as inputs, they always obtain sub-optimal results due to the scale variation. In this paper, we propose to explore the self-supervised representation for pansharpening by designing a cross-predictive diffusion model, named CrossDiff. It has two-stage training. In the first stage, we introduce a cross-predictive pretext task to pre-train the UNet structure based on conditional Denoising Diffusion Probabilistic Model (DDPM). While in the second stage, the encoders of the UNets are frozen to directly extract spatial and spectral features from PAN and MS images, and only the fusion head is trained to adapt for pansharpening task. Extensive experiments show the effectiveness and superiority of the proposed model compared with state-of-the-art supervised and unsupervised methods. Besides, the cross-sensor experiments also verify the generalization ability of proposed self-supervised representation learners for other satellite datasets. Code is available at https://github.com/codgodtao/CrossDiff.
KW - denoising diffusion probabilistic model
KW - Image fusion
KW - pansharpening
KW - self-supervised learning
UR - http://www.scopus.com/inward/record.url?scp=85204695753&partnerID=8YFLogxK
U2 - 10.1109/TIP.2024.3461476
DO - 10.1109/TIP.2024.3461476
M3 - 文章
C2 - 39302803
AN - SCOPUS:85204695753
SN - 1057-7149
VL - 33
SP - 5496
EP - 5509
JO - IEEE Transactions on Image Processing
JF - IEEE Transactions on Image Processing
ER -