Abstract
Transformers have achieved remarkable success in the field of computer vision due to their advantage in capturing the global information of images. However, they fail to model the variance of rotation, resulting in significant performance loss in target detection in remote sensing imagery. In this paper, a rotation-invariant transformer plus model, namely RIFormer+ is proposed to enhance the capabilities of transformers in rotation-invariant feature learning at both long-overlooked local-level and the acknowledged global-level. At the local-level, a rotation-invariant cross-patch embedding (RICPE) module is designed to generate dense patches, which handles encoding inconsistency of tokens with similar semantic information before and after rotation. Moreover, response-enhanced attention (REA) is proposed to extract more rotation-robust global features, which highlights overly dispersed responses ensure sustained attention on discriminative regions. Extensive experiments on three datasets demonstrate the effectiveness of RIFormer+. Without bells and whistles, RIFormer+ increases the classification accuracy by an average of 10% and improves the accuracy on rotated datasets by 20% compared with some state-of-the-art transformers.
| Original language | English |
|---|---|
| Pages (from-to) | 8423-8435 |
| Number of pages | 13 |
| Journal | IEEE Transactions on Multimedia |
| Volume | 27 |
| DOIs | |
| State | Published - 2025 |
Keywords
- Rotation-invariant feature learning
- remote sensing image classification
- transformer
Fingerprint
Dive into the research topics of 'RIFormer+: Rethinking Rotation-Invariant Feature Learning in Transformer'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver