Progressive Learning Vision Transformer for Open Set Recognition of Fine-Grained Objects in Remote Sensing Images

Yimin Fu, Zhunga Liu, Zuowei Zhang

Research output: Contribution to journalArticlepeer-review

13 Scopus citations

Abstract

Open set recognition (OSR) aims to classify known classes and recognize unknown classes simultaneously. Existing OSR methods have primarily focused on learning decision boundaries based on overall feature representations, and have achieved good performance on various coarse-grained image datasets. However, the overall feature representations of objects in fine-grained image datasets are highly similar, making it difficult to distinguish between known and unknown classes by overall feature-based decision boundaries. To address this problem, we propose a progressive learning vision transformer (PLViT) with a coarse-to-fine optimization strategy. In PLViT, the overall feature representations are first optimized in the distance space to learn the initial decision boundaries. Then, a context-aware patch selection module is designed to locate the discriminative part regions. Afterward, the multilayer representations of each selected patch are aggregated according to the self-attention weights, and input into the last transformer layer to extract local feature representations. Finally, overall and local feature representations are adaptively fused and optimized in the angular space to further refine the decision boundaries. Experimental results on four fine-grained remote sensing object recognition datasets show that PLViT outperforms state-of-the-art methods.

Original languageEnglish
Article number5215113
JournalIEEE Transactions on Geoscience and Remote Sensing
Volume61
DOIs
StatePublished - 2023

Keywords

  • Fine-grained visual classification (FGVC)
  • open set recognition (OSR)
  • progressive learning vision transformer (PLViT)
  • remote sensing

Fingerprint

Dive into the research topics of 'Progressive Learning Vision Transformer for Open Set Recognition of Fine-Grained Objects in Remote Sensing Images'. Together they form a unique fingerprint.

Cite this