Skip to main navigation Skip to search Skip to main content

USTNet: A U-Net Swin Transformer Network for Aerial Visible-to-Infrared Image Translation

  • Zonghao Han
  • , Xiaoning Chen
  • , Zixiang Ye
  • , Yuru Su
  • , Lefan Wang
  • , Shaohui Mei
  • Northwestern Polytechnical University Xian

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

Aerial visible-to-infrared image translation technology aims to generate infrared images from visible inputs, effectively expanding the acquisition of infrared images in complex aerial remote sensing scenarios. Although existing convolutional neural network (CNN)-based approaches demonstrate proficiency in capturing local spatial details, they often fail to model global scene context, which is crucial for maintaining structural consistency in complex aerial scenes. To address this challenge, a hybrid architecture network is proposed to model long-range dependencies while preserving fine-grained local details via an encoder–decoder framework. We further introduce a dynamic parallel window-based attention (DPWA) mechanism, which dynamically parallelizes window-based and shifted window-based multihead self-attention (W-MSA) in separate streams across transformer blocks to enhance global context modeling. Additionally, a masked image pretraining framework with wavelet transform loss is designed to guide multiscale feature alignment and high-frequency detail reconstruction, effectively addressing the texture discrepancy between visible and infrared modalities. Extensive experiments conducted on the upgraded aerial visible-to-infrared image translation tasks (AVIID) and DroneVehicle benchmark datasets demonstrate that our method significantly outperforms current state-of-theart (SOTA) approaches in terms of both visual quality and perceptual metrics.

Original languageEnglish
Article number5640814
JournalIEEE Transactions on Geoscience and Remote Sensing
Volume63
DOIs
StatePublished - 2025

Keywords

  • Aerial visible-to-infrared image translation
  • image-to-image (I2I) translation
  • remote sensing image processing

Fingerprint

Dive into the research topics of 'USTNet: A U-Net Swin Transformer Network for Aerial Visible-to-Infrared Image Translation'. Together they form a unique fingerprint.

Cite this