RIFormer+: Rethinking Rotation-Invariant Feature Learning in Transformer

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Transformers have achieved remarkable success in the field of computer vision due to their advantage in capturing the global information of images. However, they fail to model the variance of rotation, resulting in significant performance loss in target detection in remote sensing imagery. In this paper, a rotation-invariant transformer plus model, namely RIFormer+ is proposed to enhance the capabilities of transformers in rotation-invariant feature learning at both long-overlooked local-level and the acknowledged global-level. At the local-level, a rotation-invariant cross-patch embedding (RICPE) module is designed to generate dense patches, which handles encoding inconsistency of tokens with similar semantic information before and after rotation. Moreover, response-enhanced attention (REA) is proposed to extract more rotation-robust global features, which highlights overly dispersed responses ensure sustained attention on discriminative regions. Extensive experiments on three datasets demonstrate the effectiveness of RIFormer+. Without bells and whistles, RIFormer+ increases the classification accuracy by an average of 10% and improves the accuracy on rotated datasets by 20% compared with some state-of-the-art transformers.

Original languageEnglish
Pages (from-to)8423-8435
Number of pages13
JournalIEEE Transactions on Multimedia
Volume27
DOIs
StatePublished - 2025

Keywords

  • Rotation-invariant feature learning
  • remote sensing image classification
  • transformer

Fingerprint

Dive into the research topics of 'RIFormer+: Rethinking Rotation-Invariant Feature Learning in Transformer'. Together they form a unique fingerprint.

Cite this