跳到主要导航 跳到搜索 跳到主要内容

Lightweight modal-guided cross-attention fusion network for visible-infrared object detection

  • Wencong Wu
  • , Hongxi Zhang
  • , Xiuwei Zhang
  • , Hanlin Yin
  • , Yanning Zhang
  • Northwestern Polytechnical University Xian

科研成果: 期刊稿件文章同行评审

摘要

Visible-Infrared object detection aims to utilize the complementarity of different modalities to improve the accuracy of object classification and localization in complex environments. However, most existing methods prioritize detection performance while neglecting network complexity, limiting their real-world applications. To this end, we propose a lightweight modal-guided cross-attention fusion network (LCAFNet) for visible-infrared object detection, which is composed of a visible-guided cross-attention block (VG-CAB), an infrared-guided cross-attention block (IG-CAB), and a gated fusion block (GFB). The VG-CAB and IG-CAB utilize attention weights from one modality to guide information aggregation from another modality, enabling cross-modal information interaction and feature fusion from two different perspectives. These two blocks generate complementary features containing visible (VIS) and infrared (IR) information, resulting in a comprehensive and robust multimodal feature representation. Benefiting from the enhanced and complementary features generated by the VG-CAB and IG-CAB, the GFB achieves an adaptive and complete fusion between these features through a gating strategy. Further, shallow VIS and IR features extracted from a dual-branch backbone network are applied to mine and integrate complementary features, which have more spatial and edge information, thus improving the localization and classification capabilities of the detection model. Extensive experiments demonstrate that our proposed LCAFNet obtains better detection performance and lower network complexity compared with other excellent models on five commonly used public datasets. Specifically, on the DroneVehicle dataset, our LCAFNet outperforms the state-of-the-art model by 1.6% mAP50, but has only one-eighth the number of network parameters. The source code for our LCAFNet is accessible at https://github.com/WenCongWu/LCAFNet.

源语言英语
文章编号113350
期刊Pattern Recognition
177
DOI
出版状态已出版 - 9月 2026

指纹

探究 'Lightweight modal-guided cross-attention fusion network for visible-infrared object detection' 的科研主题。它们共同构成独一无二的指纹。

引用此