TY - JOUR
T1 - Empower Generalizability for Pansharpening Through Text-Modulated Diffusion Model
AU - Xing, Yinghui
AU - Qu, Litao
AU - Zhang, Shizhou
AU - Feng, Jiapeng
AU - Zhang, Xiuwei
AU - Zhang, Yanning
N1 - Publisher Copyright:
© 1980-2012 IEEE.
PY - 2024
Y1 - 2024
N2 - Pansharpening is crucial to remote sensing applications by fusing high-resolution (HR) panchromatic (PAN) images with low-resolution multispectral (LRMS) images to generate HR multispectral (HRMS) images. Recently, diffusion probabilistic models (DPMs) have provided high-quality results than regression-based methods when trained on specific pairwise data for their specific purpose. However, their performance degrades when applied to a new satellite dataset, which represents different imaging properties and spectral ranges, limiting the generalization ability of them. For better generalizability of pansharpening, in this article, we propose a text-modulated diffusion model (TMDiff) for unified pansharpening of different satellites. TMDiff takes a text-modulated 3-D UNet (TM3DU) as denoising network to gradually recover HRMS through iterative refinement over multiple time steps. By introducing satellite's physical properties as text prompts, TM3DU is able to learn meta-knowledge across different satellites and thus can sharpen LRMS images with diverse spatial and spectral attributes. Extensive experiments on various satellite datasets demonstrate the state-of-the-art performance of our model in both qualitative and quantitative metrics. Furthermore, our model exhibits superior generalization ability to unseen datasets, highlighting its practical significance. Code is available at https://github.com/codgodtao/TMDiff.
AB - Pansharpening is crucial to remote sensing applications by fusing high-resolution (HR) panchromatic (PAN) images with low-resolution multispectral (LRMS) images to generate HR multispectral (HRMS) images. Recently, diffusion probabilistic models (DPMs) have provided high-quality results than regression-based methods when trained on specific pairwise data for their specific purpose. However, their performance degrades when applied to a new satellite dataset, which represents different imaging properties and spectral ranges, limiting the generalization ability of them. For better generalizability of pansharpening, in this article, we propose a text-modulated diffusion model (TMDiff) for unified pansharpening of different satellites. TMDiff takes a text-modulated 3-D UNet (TM3DU) as denoising network to gradually recover HRMS through iterative refinement over multiple time steps. By introducing satellite's physical properties as text prompts, TM3DU is able to learn meta-knowledge across different satellites and thus can sharpen LRMS images with diverse spatial and spectral attributes. Extensive experiments on various satellite datasets demonstrate the state-of-the-art performance of our model in both qualitative and quantitative metrics. Furthermore, our model exhibits superior generalization ability to unseen datasets, highlighting its practical significance. Code is available at https://github.com/codgodtao/TMDiff.
KW - Denoising diffusion probabilistic model (DPM)
KW - text-modulated model
KW - unified pansharpening
UR - http://www.scopus.com/inward/record.url?scp=85200207778&partnerID=8YFLogxK
U2 - 10.1109/TGRS.2024.3434431
DO - 10.1109/TGRS.2024.3434431
M3 - 文章
AN - SCOPUS:85200207778
SN - 0196-2892
VL - 62
JO - IEEE Transactions on Geoscience and Remote Sensing
JF - IEEE Transactions on Geoscience and Remote Sensing
M1 - 5633812
ER -