Empower Generalizability for Pansharpening Through Text-Modulated Diffusion Model

Yinghui Xing; Litao Qu; Shizhou Zhang; Jiapeng Feng; Xiuwei Zhang; Yanning Zhang

doi:10.1109/TGRS.2024.3434431

Empower Generalizability for Pansharpening Through Text-Modulated Diffusion Model

Yinghui Xing, Litao Qu, Shizhou Zhang, Jiapeng Feng, Xiuwei Zhang, Yanning Zhang

School of Computer Science

Northwestern Polytechnical University Xian

Research output: Contribution to journal › Article › peer-review

3 Scopus citations

Abstract

Pansharpening is crucial to remote sensing applications by fusing high-resolution (HR) panchromatic (PAN) images with low-resolution multispectral (LRMS) images to generate HR multispectral (HRMS) images. Recently, diffusion probabilistic models (DPMs) have provided high-quality results than regression-based methods when trained on specific pairwise data for their specific purpose. However, their performance degrades when applied to a new satellite dataset, which represents different imaging properties and spectral ranges, limiting the generalization ability of them. For better generalizability of pansharpening, in this article, we propose a text-modulated diffusion model (TMDiff) for unified pansharpening of different satellites. TMDiff takes a text-modulated 3-D UNet (TM3DU) as denoising network to gradually recover HRMS through iterative refinement over multiple time steps. By introducing satellite's physical properties as text prompts, TM3DU is able to learn meta-knowledge across different satellites and thus can sharpen LRMS images with diverse spatial and spectral attributes. Extensive experiments on various satellite datasets demonstrate the state-of-the-art performance of our model in both qualitative and quantitative metrics. Furthermore, our model exhibits superior generalization ability to unseen datasets, highlighting its practical significance. Code is available at https://github.com/codgodtao/TMDiff.

Original language	English
Article number	5633812
Journal	IEEE Transactions on Geoscience and Remote Sensing
Volume	62
DOIs	https://doi.org/10.1109/TGRS.2024.3434431
State	Published - 2024

Keywords

Denoising diffusion probabilistic model (DPM)
text-modulated model
unified pansharpening

Access to Document

10.1109/TGRS.2024.3434431

Cite this

@article{52bf7f2f7dc94f0f9cd75d9bbce79196,

title = "Empower Generalizability for Pansharpening Through Text-Modulated Diffusion Model",

abstract = "Pansharpening is crucial to remote sensing applications by fusing high-resolution (HR) panchromatic (PAN) images with low-resolution multispectral (LRMS) images to generate HR multispectral (HRMS) images. Recently, diffusion probabilistic models (DPMs) have provided high-quality results than regression-based methods when trained on specific pairwise data for their specific purpose. However, their performance degrades when applied to a new satellite dataset, which represents different imaging properties and spectral ranges, limiting the generalization ability of them. For better generalizability of pansharpening, in this article, we propose a text-modulated diffusion model (TMDiff) for unified pansharpening of different satellites. TMDiff takes a text-modulated 3-D UNet (TM3DU) as denoising network to gradually recover HRMS through iterative refinement over multiple time steps. By introducing satellite's physical properties as text prompts, TM3DU is able to learn meta-knowledge across different satellites and thus can sharpen LRMS images with diverse spatial and spectral attributes. Extensive experiments on various satellite datasets demonstrate the state-of-the-art performance of our model in both qualitative and quantitative metrics. Furthermore, our model exhibits superior generalization ability to unseen datasets, highlighting its practical significance. Code is available at https://github.com/codgodtao/TMDiff.",

keywords = "Denoising diffusion probabilistic model (DPM), text-modulated model, unified pansharpening",

author = "Yinghui Xing and Litao Qu and Shizhou Zhang and Jiapeng Feng and Xiuwei Zhang and Yanning Zhang",

note = "Publisher Copyright: {\textcopyright} 1980-2012 IEEE.",

year = "2024",

doi = "10.1109/TGRS.2024.3434431",

language = "英语",

volume = "62",

journal = "IEEE Transactions on Geoscience and Remote Sensing",

issn = "0196-2892",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Empower Generalizability for Pansharpening Through Text-Modulated Diffusion Model

AU - Xing, Yinghui

AU - Qu, Litao

AU - Zhang, Shizhou

AU - Feng, Jiapeng

AU - Zhang, Xiuwei

AU - Zhang, Yanning

PY - 2024

Y1 - 2024

N2 - Pansharpening is crucial to remote sensing applications by fusing high-resolution (HR) panchromatic (PAN) images with low-resolution multispectral (LRMS) images to generate HR multispectral (HRMS) images. Recently, diffusion probabilistic models (DPMs) have provided high-quality results than regression-based methods when trained on specific pairwise data for their specific purpose. However, their performance degrades when applied to a new satellite dataset, which represents different imaging properties and spectral ranges, limiting the generalization ability of them. For better generalizability of pansharpening, in this article, we propose a text-modulated diffusion model (TMDiff) for unified pansharpening of different satellites. TMDiff takes a text-modulated 3-D UNet (TM3DU) as denoising network to gradually recover HRMS through iterative refinement over multiple time steps. By introducing satellite's physical properties as text prompts, TM3DU is able to learn meta-knowledge across different satellites and thus can sharpen LRMS images with diverse spatial and spectral attributes. Extensive experiments on various satellite datasets demonstrate the state-of-the-art performance of our model in both qualitative and quantitative metrics. Furthermore, our model exhibits superior generalization ability to unseen datasets, highlighting its practical significance. Code is available at https://github.com/codgodtao/TMDiff.

AB - Pansharpening is crucial to remote sensing applications by fusing high-resolution (HR) panchromatic (PAN) images with low-resolution multispectral (LRMS) images to generate HR multispectral (HRMS) images. Recently, diffusion probabilistic models (DPMs) have provided high-quality results than regression-based methods when trained on specific pairwise data for their specific purpose. However, their performance degrades when applied to a new satellite dataset, which represents different imaging properties and spectral ranges, limiting the generalization ability of them. For better generalizability of pansharpening, in this article, we propose a text-modulated diffusion model (TMDiff) for unified pansharpening of different satellites. TMDiff takes a text-modulated 3-D UNet (TM3DU) as denoising network to gradually recover HRMS through iterative refinement over multiple time steps. By introducing satellite's physical properties as text prompts, TM3DU is able to learn meta-knowledge across different satellites and thus can sharpen LRMS images with diverse spatial and spectral attributes. Extensive experiments on various satellite datasets demonstrate the state-of-the-art performance of our model in both qualitative and quantitative metrics. Furthermore, our model exhibits superior generalization ability to unseen datasets, highlighting its practical significance. Code is available at https://github.com/codgodtao/TMDiff.

KW - Denoising diffusion probabilistic model (DPM)

KW - text-modulated model

KW - unified pansharpening

UR - http://www.scopus.com/inward/record.url?scp=85200207778&partnerID=8YFLogxK

U2 - 10.1109/TGRS.2024.3434431

DO - 10.1109/TGRS.2024.3434431

M3 - 文章

AN - SCOPUS:85200207778

SN - 0196-2892

VL - 62

JO - IEEE Transactions on Geoscience and Remote Sensing

JF - IEEE Transactions on Geoscience and Remote Sensing

M1 - 5633812

ER -

Empower Generalizability for Pansharpening Through Text-Modulated Diffusion Model

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this