Abstract
Few-shot segmentation (FSS) in remote sensing aims to achieve segmentation of novel categories in query images using limited annotated support images. Despite extensive research, the significant intraclass differences of remote sensing targets continue to hinder progress in this field. Pretrained vision–language models (VLMs) possess strong generalization capabilities, and their cross-modal information can effectively mitigate intraclass variance issues. However, VLMs rarely focus on dense prediction tasks, and the complexity of remote sensing imagery limits the effectiveness of existing attempts on FSS tasks. To address this issue, this article proposes an implicit contrastive language-image pretraining (CLIP) prior decoupling network (ICPD-Net), which mines effective cross-modal priors from VLMs and leverages ranking information to improve visual metric strategies. Specifically, the implicit prior decoupling module (IPDM) utilizes ambiguous foreground–background vision–language similarities to construct class-agnostic prompts, while employing a prior learner to mine implicit vision–language priors that alleviate intraclass differences. To fully leverage crossmodal information, the reliable feature fusion module (RFFM) utilizes vision–language priors to obtain high-confidence query features for fusion with support features and further mitigates intraclass differences through a self-support paradigm. Finally, the dual visual priors module (DVPM) introduces a novel rank information prior for visual feature measurement. This approach constructs an effective metric learning method by combining the ranking relationships of Euclidean distances between supportquery features with the normalized discounted cumulative gain (NDCG) algorithm, while comprehensively exploring visual metric relationships through a traditional cosine similarity prior. Extensive experiments on iSAID-5i and DLRSD-5i demonstrate that our method achieves significant improvements. Particularly under the one-shot setting, our approach shows exceptional effectiveness, outperforming state-of-the-art methods by up to 11.48%
| Original language | English |
|---|---|
| Article number | 5646813 |
| Journal | IEEE Transactions on Geoscience and Remote Sensing |
| Volume | 63 |
| DOIs | |
| State | Published - 2025 |
Keywords
- Cross-modal learning
- few-shot segmentation (FSS)
- metric learning
- remote sensing
Fingerprint
Dive into the research topics of 'Implicit CLIP Prior Decoupling for Few-Shot Remote Sensing Image Segmentation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver