TY - JOUR
T1 - Exploiting foundational spatial semantic prior for accurate few-shot hyperspectral image classification
AU - Zhao, Xingbing
AU - Zhang, Lei
AU - Zhang, Lei
AU - Ren, Weixin
AU - Bai, Pengfei
AU - Wei, Wei
AU - Ding, Chen
AU - Zhang, Yanning
N1 - Publisher Copyright:
© 2026 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
PY - 2026/8/1
Y1 - 2026/8/1
N2 - Extensive deep neural networks with various architectures have been developed for hyperspectral image (HSI) classification, i.e., identifying class category of each pixel based on their spatial-spectral characteristics. However, due to lack of a large-scaled annotated HSI dataset (e.g., ImageNet) for sufficient pre-training, these networks struggle to comprehensively exploit the complicated spatial semantic knowledge (e.g., context, global and local similarity etc.) of input HSI and thus shows limited generalization capacity, especially in the challenging few-shot scenarios. To mitigate this problem, we propose a novel spatial semantic knowledge transfer framework (SSPF), which attempts to borrow the powerful representation capacity of foundation models (e.g., SAM, DINO and CLIP etc.) in terms of capturing the complicated spatial semantic knowledge in natural images to enhance the feature representation of HSI during few-shot learning. In a specific, the framework comprises of a patch-free HSI classification (HSIC) network and a spatial semantic knowledge transfer network. The former utilizes a U-shape network to predict the pixel-wise class category without patch sampling, while the latter first learns to pixel-wise dynamically compress the input HSI into a three-band pseudo-color image and feeds it into the following foundation model to extract the spatial semantic knowledge. Such a pixel-wise image compression scheme enables efficiently collecting the useful spatial semantic knowledge distributed across different spectral bands into the optimal three-band image so that it is suitable for the foundation model for better spatial semantic knowledge extraction. Moreover, the extracted semantic knowledge is enhanced by channel-wise attention and integrated into the feature representation of the input HSI for classification in a coarse-to-fine manner. By doing these, the spatial semantic knowledge extracted by the foundation models pretrained on large-scale nature images can be appropriately adapted and injected into the cross-domain HSI, thus enhancing the generalization performance in few-shot learning. Experiments on six benchmark HSI datasets demonstrate the superiority of the proposed method over existing state-of-the-art baselines in terms of few-shot classification. The code will be available at https://github.com/zhaoxb2025/SSPF-main.
AB - Extensive deep neural networks with various architectures have been developed for hyperspectral image (HSI) classification, i.e., identifying class category of each pixel based on their spatial-spectral characteristics. However, due to lack of a large-scaled annotated HSI dataset (e.g., ImageNet) for sufficient pre-training, these networks struggle to comprehensively exploit the complicated spatial semantic knowledge (e.g., context, global and local similarity etc.) of input HSI and thus shows limited generalization capacity, especially in the challenging few-shot scenarios. To mitigate this problem, we propose a novel spatial semantic knowledge transfer framework (SSPF), which attempts to borrow the powerful representation capacity of foundation models (e.g., SAM, DINO and CLIP etc.) in terms of capturing the complicated spatial semantic knowledge in natural images to enhance the feature representation of HSI during few-shot learning. In a specific, the framework comprises of a patch-free HSI classification (HSIC) network and a spatial semantic knowledge transfer network. The former utilizes a U-shape network to predict the pixel-wise class category without patch sampling, while the latter first learns to pixel-wise dynamically compress the input HSI into a three-band pseudo-color image and feeds it into the following foundation model to extract the spatial semantic knowledge. Such a pixel-wise image compression scheme enables efficiently collecting the useful spatial semantic knowledge distributed across different spectral bands into the optimal three-band image so that it is suitable for the foundation model for better spatial semantic knowledge extraction. Moreover, the extracted semantic knowledge is enhanced by channel-wise attention and integrated into the feature representation of the input HSI for classification in a coarse-to-fine manner. By doing these, the spatial semantic knowledge extracted by the foundation models pretrained on large-scale nature images can be appropriately adapted and injected into the cross-domain HSI, thus enhancing the generalization performance in few-shot learning. Experiments on six benchmark HSI datasets demonstrate the superiority of the proposed method over existing state-of-the-art baselines in terms of few-shot classification. The code will be available at https://github.com/zhaoxb2025/SSPF-main.
KW - Few-shot learning (FSL)
KW - Foundation models
KW - Hyperspectral image (HSI) classification
UR - https://www.scopus.com/pages/publications/105035645787
U2 - 10.1016/j.eswa.2026.132252
DO - 10.1016/j.eswa.2026.132252
M3 - 文章
AN - SCOPUS:105035645787
SN - 0957-4174
VL - 322
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 132252
ER -