Abstract
Visible-Infrared object detection aims to utilize the complementarity of different modalities to improve the accuracy of object classification and localization in complex environments. However, most existing methods prioritize detection performance while neglecting network complexity, limiting their real-world applications. To this end, we propose a lightweight modal-guided cross-attention fusion network (LCAFNet) for visible-infrared object detection, which is composed of a visible-guided cross-attention block (VG-CAB), an infrared-guided cross-attention block (IG-CAB), and a gated fusion block (GFB). The VG-CAB and IG-CAB utilize attention weights from one modality to guide information aggregation from another modality, enabling cross-modal information interaction and feature fusion from two different perspectives. These two blocks generate complementary features containing visible (VIS) and infrared (IR) information, resulting in a comprehensive and robust multimodal feature representation. Benefiting from the enhanced and complementary features generated by the VG-CAB and IG-CAB, the GFB achieves an adaptive and complete fusion between these features through a gating strategy. Further, shallow VIS and IR features extracted from a dual-branch backbone network are applied to mine and integrate complementary features, which have more spatial and edge information, thus improving the localization and classification capabilities of the detection model. Extensive experiments demonstrate that our proposed LCAFNet obtains better detection performance and lower network complexity compared with other excellent models on five commonly used public datasets. Specifically, on the DroneVehicle dataset, our LCAFNet outperforms the state-of-the-art model by 1.6% mAP50, but has only one-eighth the number of network parameters. The source code for our LCAFNet is accessible at https://github.com/WenCongWu/LCAFNet.
| Original language | English |
|---|---|
| Article number | 113350 |
| Journal | Pattern Recognition |
| Volume | 177 |
| DOIs | |
| State | Published - Sep 2026 |
Keywords
- Cross-attention
- Gated fusion
- Lightweight network
- Visible-infrared object detection
Fingerprint
Dive into the research topics of 'Lightweight modal-guided cross-attention fusion network for visible-infrared object detection'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver