Skip to main navigation Skip to search Skip to main content

STAIT: A Spatio-Temporal Alternating Iterative Transformer for Multi-Temporal Remote Sensing Image Cloud Removal

  • Yukun Cui
  • , Jiangshe Zhang
  • , Haowen Bai
  • , Zixiang Zhao
  • , Lilun Deng
  • , Shuang Xu
  • , Chunxia Zhang
  • Xi'an Jiaotong University

Research output: Contribution to journalArticlepeer-review

Abstract

Highlights: What are the main findings? A novel Spatio-Temporal Alternating Iterative Transformer (STAIT) is proposed to explicitly model the dynamic dependencies in multi-temporal remote sensing image cloud removal task. An efficient framework combining multi-level feature extraction and a weight-sharing decoder is designed to ensure high-quality, temporally consistent reconstruction. What are the implication of the main findings? The method significantly improves cloud removal accuracy, effectively restoring surface details obscured by thick clouds. It provides a robust and efficient solution for generating continuous remote sensing data, enhancing the reliability of Earth observation applications. Multi-temporal remote sensing image cloud removal aims to reconstruct land surface information in regions obscured by clouds and their shadows, thereby mitigating a major constraint on the application of remote sensing imagery. However, existing multi-temporal deep learning methods for cloud removal often fail to model complex spatio-temporal dynamics, leading to suboptimal performance. To address this challenge, we propose a novel framework for multi-temporal cloud removal. In this architecture, the most critical component is the Spatio-Temporal Alternating Iterative Transformer (STAIT), which primarily consists of temporal and spatial attention mechanisms. STAIT is engineered to refine spatio-temporal feature representation by establishing an effective interplay between spatial details and temporal dynamics. Our framework is enhanced by an efficient image token generator with group convolution-based multi-level feature extraction to manage complexity, and a pixel reconstruction decoder with a shared progressive upsampling network to improve reconstruction by learning time-invariant features. Experimental results demonstrate that by explicitly modeling spatio-temporal feature dependencies, our approach achieves superior performance in restoring high-fidelity, cloud-free imagery.

Original languageEnglish
Article number596
JournalRemote Sensing
Volume18
Issue number4
DOIs
StatePublished - Feb 2026

Keywords

  • multi-temporal cloud removal
  • remote sensing image
  • transformer

Fingerprint

Dive into the research topics of 'STAIT: A Spatio-Temporal Alternating Iterative Transformer for Multi-Temporal Remote Sensing Image Cloud Removal'. Together they form a unique fingerprint.

Cite this