TY - JOUR
T1 - Progressive Learning Vision Transformer for Open Set Recognition of Fine-Grained Objects in Remote Sensing Images
AU - Fu, Yimin
AU - Liu, Zhunga
AU - Zhang, Zuowei
N1 - Publisher Copyright:
© 1980-2012 IEEE.
PY - 2023
Y1 - 2023
N2 - Open set recognition (OSR) aims to classify known classes and recognize unknown classes simultaneously. Existing OSR methods have primarily focused on learning decision boundaries based on overall feature representations, and have achieved good performance on various coarse-grained image datasets. However, the overall feature representations of objects in fine-grained image datasets are highly similar, making it difficult to distinguish between known and unknown classes by overall feature-based decision boundaries. To address this problem, we propose a progressive learning vision transformer (PLViT) with a coarse-to-fine optimization strategy. In PLViT, the overall feature representations are first optimized in the distance space to learn the initial decision boundaries. Then, a context-aware patch selection module is designed to locate the discriminative part regions. Afterward, the multilayer representations of each selected patch are aggregated according to the self-attention weights, and input into the last transformer layer to extract local feature representations. Finally, overall and local feature representations are adaptively fused and optimized in the angular space to further refine the decision boundaries. Experimental results on four fine-grained remote sensing object recognition datasets show that PLViT outperforms state-of-the-art methods.
AB - Open set recognition (OSR) aims to classify known classes and recognize unknown classes simultaneously. Existing OSR methods have primarily focused on learning decision boundaries based on overall feature representations, and have achieved good performance on various coarse-grained image datasets. However, the overall feature representations of objects in fine-grained image datasets are highly similar, making it difficult to distinguish between known and unknown classes by overall feature-based decision boundaries. To address this problem, we propose a progressive learning vision transformer (PLViT) with a coarse-to-fine optimization strategy. In PLViT, the overall feature representations are first optimized in the distance space to learn the initial decision boundaries. Then, a context-aware patch selection module is designed to locate the discriminative part regions. Afterward, the multilayer representations of each selected patch are aggregated according to the self-attention weights, and input into the last transformer layer to extract local feature representations. Finally, overall and local feature representations are adaptively fused and optimized in the angular space to further refine the decision boundaries. Experimental results on four fine-grained remote sensing object recognition datasets show that PLViT outperforms state-of-the-art methods.
KW - Fine-grained visual classification (FGVC)
KW - open set recognition (OSR)
KW - progressive learning vision transformer (PLViT)
KW - remote sensing
UR - http://www.scopus.com/inward/record.url?scp=85169697069&partnerID=8YFLogxK
U2 - 10.1109/TGRS.2023.3309091
DO - 10.1109/TGRS.2023.3309091
M3 - 文章
AN - SCOPUS:85169697069
SN - 0196-2892
VL - 61
JO - IEEE Transactions on Geoscience and Remote Sensing
JF - IEEE Transactions on Geoscience and Remote Sensing
M1 - 5215113
ER -