TY - JOUR
T1 - CAMCFormer
T2 - Cross-Attention and Multicorrelation Aided Transformer for Few-Shot Object Detection in Optical Remote Sensing Images
AU - Wang, Lefan
AU - Mei, Shaohui
AU - Wang, Yi
AU - Lian, Jiawei
AU - Han, Zonghao
AU - Feng, Yan
N1 - Publisher Copyright:
© 2025 IEEE.All rights reserved
PY - 2025
Y1 - 2025
N2 - Few-shot object detection (FSOD) enables the detection of novel-class objects in remote sensing images (RSIs) with limited labeled samples. Although convolutional neural networks (CNNs) are commonly used for this task, they suffer from two inherent constraints. First, their limited local receptive field fails to capture global context within a single image and the relational dependencies between query and support images. Second, an additional feature alignment mechanism is typically required to bridge the gap between query and support images. To address these challenges, this work introduces a novel cross-attention and multicorrelation aided transformer (CAMCFormer) FSOD framework tailored for global feature representation and multicorrelation modeling in complex and large-scale RSIs. Specifically, a long-distance cross-attention module (LDCAM) is devised to capture dependencies between distant elements across query and support images at each feature extraction layer. This module facilitates the exchange of contextual information between images, resulting in more comprehensive feature representations and eliminating the need for separate feature alignment and fusion modules. Multicorrelation aided heads (MAHs) are constructed to enhance detection performance further to model various relational aspects, i.e., channel-correlation detection head (CCDH), spatial-correlation detection head (SCDH), and cross-attention detection head (CADH). These aided heads contribute to more robust and accurate classification and localization. Comprehensive experiments have been conducted, demonstrating the superiority of the proposed framework compared to several state-of-the-art detectors, highlighting its potential as an effective solution for FSOD in remote sensing scenarios.
AB - Few-shot object detection (FSOD) enables the detection of novel-class objects in remote sensing images (RSIs) with limited labeled samples. Although convolutional neural networks (CNNs) are commonly used for this task, they suffer from two inherent constraints. First, their limited local receptive field fails to capture global context within a single image and the relational dependencies between query and support images. Second, an additional feature alignment mechanism is typically required to bridge the gap between query and support images. To address these challenges, this work introduces a novel cross-attention and multicorrelation aided transformer (CAMCFormer) FSOD framework tailored for global feature representation and multicorrelation modeling in complex and large-scale RSIs. Specifically, a long-distance cross-attention module (LDCAM) is devised to capture dependencies between distant elements across query and support images at each feature extraction layer. This module facilitates the exchange of contextual information between images, resulting in more comprehensive feature representations and eliminating the need for separate feature alignment and fusion modules. Multicorrelation aided heads (MAHs) are constructed to enhance detection performance further to model various relational aspects, i.e., channel-correlation detection head (CCDH), spatial-correlation detection head (SCDH), and cross-attention detection head (CADH). These aided heads contribute to more robust and accurate classification and localization. Comprehensive experiments have been conducted, demonstrating the superiority of the proposed framework compared to several state-of-the-art detectors, highlighting its potential as an effective solution for FSOD in remote sensing scenarios.
KW - Few-shot learning (FSL)
KW - object detection
KW - optical remote sensing images (RSIs)
KW - transformer
UR - http://www.scopus.com/inward/record.url?scp=105001078293&partnerID=8YFLogxK
U2 - 10.1109/TGRS.2025.3543583
DO - 10.1109/TGRS.2025.3543583
M3 - 文章
AN - SCOPUS:105001078293
SN - 0196-2892
VL - 63
JO - IEEE Transactions on Geoscience and Remote Sensing
JF - IEEE Transactions on Geoscience and Remote Sensing
M1 - 5613316
ER -