Empower Generalizability for Pansharpening Through Text-Modulated Diffusion Model

Yinghui Xing, Litao Qu, Shizhou Zhang, Jiapeng Feng, Xiuwei Zhang, Yanning Zhang

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

Pansharpening is crucial to remote sensing applications by fusing high-resolution (HR) panchromatic (PAN) images with low-resolution multispectral (LRMS) images to generate HR multispectral (HRMS) images. Recently, diffusion probabilistic models (DPMs) have provided high-quality results than regression-based methods when trained on specific pairwise data for their specific purpose. However, their performance degrades when applied to a new satellite dataset, which represents different imaging properties and spectral ranges, limiting the generalization ability of them. For better generalizability of pansharpening, in this article, we propose a text-modulated diffusion model (TMDiff) for unified pansharpening of different satellites. TMDiff takes a text-modulated 3-D UNet (TM3DU) as denoising network to gradually recover HRMS through iterative refinement over multiple time steps. By introducing satellite's physical properties as text prompts, TM3DU is able to learn meta-knowledge across different satellites and thus can sharpen LRMS images with diverse spatial and spectral attributes. Extensive experiments on various satellite datasets demonstrate the state-of-the-art performance of our model in both qualitative and quantitative metrics. Furthermore, our model exhibits superior generalization ability to unseen datasets, highlighting its practical significance. Code is available at https://github.com/codgodtao/TMDiff.

Original languageEnglish
Article number5633812
JournalIEEE Transactions on Geoscience and Remote Sensing
Volume62
DOIs
StatePublished - 2024

Keywords

  • Denoising diffusion probabilistic model (DPM)
  • text-modulated model
  • unified pansharpening

Fingerprint

Dive into the research topics of 'Empower Generalizability for Pansharpening Through Text-Modulated Diffusion Model'. Together they form a unique fingerprint.

Cite this