Abstract
Multimodal object detection aims to utilize the complementarity between different modalities to improve detection results. However, most existing methods only enhance intermodality features by leveraging the interaction of spatial information while neglecting the interaction of channel information between multimodalities, resulting in insufficient enhancement of cross-modal features. Moreover, many detection models fuse multimodal features within a single feature dimension, failing to consider the use of multi-dimensional information, which means that multimodal feature information has not been fully exploited. To solve these drawbacks, we propose a cross-dimension fusion network with dual feature enhancement (CDFNet) for visible and infrared object detection. Specifically, a dual feature enhancement module (DFEM) is designed to enhance cross-modal representations by modeling multiplicative interactions at both spatial and channel levels. Furthermore, a cross-dimension feature fusion module (CDFFM) is developed for fully integrating the enhanced features by capturing different dimensional dependencies to obtain a more discriminative fused feature. Extensive experiments demonstrate that our proposed CDFNet achieves a 1.8% higher mAP detection accuracy on the LLVIP dataset compared to the state-of-the-art detection method, and exhibits more competitive network complexity than transformer-based and mamba-based models. The code of our CDFNet is released at https://github.com/WenCongWu/CDFNet.
| Original language | English |
|---|---|
| Article number | 132380 |
| Journal | Expert Systems with Applications |
| Volume | 322 |
| DOIs | |
| State | Published - 1 Aug 2026 |
Keywords
- Multimodal object detection
- cross-dimension feature fusion
- feature enhancement
- feature interaction
Fingerprint
Dive into the research topics of 'CDFNet: Cross-dimension fusion network with dual feature enhancement for multimodal object detection'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver