Skip to main navigation Skip to search Skip to main content

CDFNet: Cross-dimension fusion network with dual feature enhancement for multimodal object detection

  • Wencong Wu
  • , Xiuwei Zhang
  • , Hanlin Yin
  • , Haorui Zeng
  • , Chenxu Wei
  • , Lei Yu
  • , Yanning Zhang
  • Northwestern Polytechnical University Xian

Research output: Contribution to journalArticlepeer-review

Abstract

Multimodal object detection aims to utilize the complementarity between different modalities to improve detection results. However, most existing methods only enhance intermodality features by leveraging the interaction of spatial information while neglecting the interaction of channel information between multimodalities, resulting in insufficient enhancement of cross-modal features. Moreover, many detection models fuse multimodal features within a single feature dimension, failing to consider the use of multi-dimensional information, which means that multimodal feature information has not been fully exploited. To solve these drawbacks, we propose a cross-dimension fusion network with dual feature enhancement (CDFNet) for visible and infrared object detection. Specifically, a dual feature enhancement module (DFEM) is designed to enhance cross-modal representations by modeling multiplicative interactions at both spatial and channel levels. Furthermore, a cross-dimension feature fusion module (CDFFM) is developed for fully integrating the enhanced features by capturing different dimensional dependencies to obtain a more discriminative fused feature. Extensive experiments demonstrate that our proposed CDFNet achieves a 1.8% higher mAP detection accuracy on the LLVIP dataset compared to the state-of-the-art detection method, and exhibits more competitive network complexity than transformer-based and mamba-based models. The code of our CDFNet is released at https://github.com/WenCongWu/CDFNet.

Original languageEnglish
Article number132380
JournalExpert Systems with Applications
Volume322
DOIs
StatePublished - 1 Aug 2026

Keywords

  • Multimodal object detection
  • cross-dimension feature fusion
  • feature enhancement
  • feature interaction

Fingerprint

Dive into the research topics of 'CDFNet: Cross-dimension fusion network with dual feature enhancement for multimodal object detection'. Together they form a unique fingerprint.

Cite this