TY - JOUR
T1 - CAMCFormer
T2 - Cross-Attention and Multi-Correlation Aided Transformer for Few-Shot Object Detection in Optical Remote Sensing Images
AU - Wang, Lefan
AU - Mei, Shaohui
AU - Wang, Yi
AU - Lian, Jiawei
AU - Han, Zonghao
AU - Feng, Yan
N1 - Publisher Copyright:
© 1980-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - Few-shot object detection (FSOD) enables the detection of novel-class objects in remote sensing images (RSIs) with limited labeled samples. Although convolutional neural networks (CNNs) are commonly used for this task, they suffer from two inherent constraints. First, their limited local receptive field fails to capture global context within a single image and the relational dependencies between query and support images. Second, an additional feature alignment mechanism is typically required to bridge the gap between query and support images. To address these challenges, this work introduces a novel Cross-Attention and Multi-Correlation Aided Transformer (CAMCFormer) FSOD framework tailored for global feature representation and multi-correlation modeling in complex and large-scale RSIs. Specifically, a Long-Distance Cross-Attention Module (LDCAM) is devised to capture dependencies between distant elements across query and support images at each feature extraction layer. This module facilitates the exchange of contextual information between images, resulting in more comprehensive feature representations and eliminating the need for separate feature alignment and fusion modules. Multi-Correlation Aided Heads (MAHs) are constructed to enhance detection performance further to model various relational aspects, i.e., Channel-Correlation Detection Head (CCDH), Spatial-Correlation Detection Head (SCDH), and Cross-Attention Detection Head (CADH). These aided heads contribute to more robust and accurate classification and localization. Comprehensive experiments have been conducted, demonstrating the superiority of the proposed framework compared to several state-of-the-art detectors, highlighting its potential as an effective solution for few-shot object detection in remote sensing scenarios.
AB - Few-shot object detection (FSOD) enables the detection of novel-class objects in remote sensing images (RSIs) with limited labeled samples. Although convolutional neural networks (CNNs) are commonly used for this task, they suffer from two inherent constraints. First, their limited local receptive field fails to capture global context within a single image and the relational dependencies between query and support images. Second, an additional feature alignment mechanism is typically required to bridge the gap between query and support images. To address these challenges, this work introduces a novel Cross-Attention and Multi-Correlation Aided Transformer (CAMCFormer) FSOD framework tailored for global feature representation and multi-correlation modeling in complex and large-scale RSIs. Specifically, a Long-Distance Cross-Attention Module (LDCAM) is devised to capture dependencies between distant elements across query and support images at each feature extraction layer. This module facilitates the exchange of contextual information between images, resulting in more comprehensive feature representations and eliminating the need for separate feature alignment and fusion modules. Multi-Correlation Aided Heads (MAHs) are constructed to enhance detection performance further to model various relational aspects, i.e., Channel-Correlation Detection Head (CCDH), Spatial-Correlation Detection Head (SCDH), and Cross-Attention Detection Head (CADH). These aided heads contribute to more robust and accurate classification and localization. Comprehensive experiments have been conducted, demonstrating the superiority of the proposed framework compared to several state-of-the-art detectors, highlighting its potential as an effective solution for few-shot object detection in remote sensing scenarios.
KW - Few-shot learning
KW - object detection
KW - optical remote sensing images
KW - transformer
UR - http://www.scopus.com/inward/record.url?scp=85218781568&partnerID=8YFLogxK
U2 - 10.1109/TGRS.2025.3543583
DO - 10.1109/TGRS.2025.3543583
M3 - 文章
AN - SCOPUS:85218781568
SN - 0196-2892
JO - IEEE Transactions on Geoscience and Remote Sensing
JF - IEEE Transactions on Geoscience and Remote Sensing
ER -