Progressive Learning Vision Transformer for Open Set Recognition of Fine-Grained Objects in Remote Sensing Images

Yimin Fu; Zhunga Liu; Zuowei Zhang

doi:10.1109/TGRS.2023.3309091

Progressive Learning Vision Transformer for Open Set Recognition of Fine-Grained Objects in Remote Sensing Images

Yimin Fu, Zhunga Liu, Zuowei Zhang

自动化学院

Northwestern Polytechnical University Xian

科研成果: 期刊稿件 › 文章 › 同行评审

13 引用（Scopus）

摘要

Open set recognition (OSR) aims to classify known classes and recognize unknown classes simultaneously. Existing OSR methods have primarily focused on learning decision boundaries based on overall feature representations, and have achieved good performance on various coarse-grained image datasets. However, the overall feature representations of objects in fine-grained image datasets are highly similar, making it difficult to distinguish between known and unknown classes by overall feature-based decision boundaries. To address this problem, we propose a progressive learning vision transformer (PLViT) with a coarse-to-fine optimization strategy. In PLViT, the overall feature representations are first optimized in the distance space to learn the initial decision boundaries. Then, a context-aware patch selection module is designed to locate the discriminative part regions. Afterward, the multilayer representations of each selected patch are aggregated according to the self-attention weights, and input into the last transformer layer to extract local feature representations. Finally, overall and local feature representations are adaptively fused and optimized in the angular space to further refine the decision boundaries. Experimental results on four fine-grained remote sensing object recognition datasets show that PLViT outperforms state-of-the-art methods.

源语言	英语
文章编号	5215113
期刊	IEEE Transactions on Geoscience and Remote Sensing
卷	61
DOI	https://doi.org/10.1109/TGRS.2023.3309091
出版状态	已出版 - 2023

访问文件

10.1109/TGRS.2023.3309091

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{80538ad2c73741ddb85265f8ee51c1d4,

title = "Progressive Learning Vision Transformer for Open Set Recognition of Fine-Grained Objects in Remote Sensing Images",

abstract = "Open set recognition (OSR) aims to classify known classes and recognize unknown classes simultaneously. Existing OSR methods have primarily focused on learning decision boundaries based on overall feature representations, and have achieved good performance on various coarse-grained image datasets. However, the overall feature representations of objects in fine-grained image datasets are highly similar, making it difficult to distinguish between known and unknown classes by overall feature-based decision boundaries. To address this problem, we propose a progressive learning vision transformer (PLViT) with a coarse-to-fine optimization strategy. In PLViT, the overall feature representations are first optimized in the distance space to learn the initial decision boundaries. Then, a context-aware patch selection module is designed to locate the discriminative part regions. Afterward, the multilayer representations of each selected patch are aggregated according to the self-attention weights, and input into the last transformer layer to extract local feature representations. Finally, overall and local feature representations are adaptively fused and optimized in the angular space to further refine the decision boundaries. Experimental results on four fine-grained remote sensing object recognition datasets show that PLViT outperforms state-of-the-art methods.",

keywords = "Fine-grained visual classification (FGVC), open set recognition (OSR), progressive learning vision transformer (PLViT), remote sensing",

author = "Yimin Fu and Zhunga Liu and Zuowei Zhang",

note = "Publisher Copyright: {\textcopyright} 1980-2012 IEEE.",

year = "2023",

doi = "10.1109/TGRS.2023.3309091",

language = "英语",

volume = "61",

journal = "IEEE Transactions on Geoscience and Remote Sensing",

issn = "0196-2892",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Progressive Learning Vision Transformer for Open Set Recognition of Fine-Grained Objects in Remote Sensing Images

AU - Fu, Yimin

AU - Liu, Zhunga

AU - Zhang, Zuowei

PY - 2023

Y1 - 2023

N2 - Open set recognition (OSR) aims to classify known classes and recognize unknown classes simultaneously. Existing OSR methods have primarily focused on learning decision boundaries based on overall feature representations, and have achieved good performance on various coarse-grained image datasets. However, the overall feature representations of objects in fine-grained image datasets are highly similar, making it difficult to distinguish between known and unknown classes by overall feature-based decision boundaries. To address this problem, we propose a progressive learning vision transformer (PLViT) with a coarse-to-fine optimization strategy. In PLViT, the overall feature representations are first optimized in the distance space to learn the initial decision boundaries. Then, a context-aware patch selection module is designed to locate the discriminative part regions. Afterward, the multilayer representations of each selected patch are aggregated according to the self-attention weights, and input into the last transformer layer to extract local feature representations. Finally, overall and local feature representations are adaptively fused and optimized in the angular space to further refine the decision boundaries. Experimental results on four fine-grained remote sensing object recognition datasets show that PLViT outperforms state-of-the-art methods.

AB - Open set recognition (OSR) aims to classify known classes and recognize unknown classes simultaneously. Existing OSR methods have primarily focused on learning decision boundaries based on overall feature representations, and have achieved good performance on various coarse-grained image datasets. However, the overall feature representations of objects in fine-grained image datasets are highly similar, making it difficult to distinguish between known and unknown classes by overall feature-based decision boundaries. To address this problem, we propose a progressive learning vision transformer (PLViT) with a coarse-to-fine optimization strategy. In PLViT, the overall feature representations are first optimized in the distance space to learn the initial decision boundaries. Then, a context-aware patch selection module is designed to locate the discriminative part regions. Afterward, the multilayer representations of each selected patch are aggregated according to the self-attention weights, and input into the last transformer layer to extract local feature representations. Finally, overall and local feature representations are adaptively fused and optimized in the angular space to further refine the decision boundaries. Experimental results on four fine-grained remote sensing object recognition datasets show that PLViT outperforms state-of-the-art methods.

KW - Fine-grained visual classification (FGVC)

KW - open set recognition (OSR)

KW - progressive learning vision transformer (PLViT)

KW - remote sensing

UR - http://www.scopus.com/inward/record.url?scp=85169697069&partnerID=8YFLogxK

U2 - 10.1109/TGRS.2023.3309091

DO - 10.1109/TGRS.2023.3309091

M3 - 文章

AN - SCOPUS:85169697069

SN - 0196-2892

VL - 61

JO - IEEE Transactions on Geoscience and Remote Sensing

JF - IEEE Transactions on Geoscience and Remote Sensing

M1 - 5215113

ER -

Progressive Learning Vision Transformer for Open Set Recognition of Fine-Grained Objects in Remote Sensing Images

摘要

访问文件

其它文件与链接

指纹

引用此