TY - GEN
T1 - Selective Multi-Scale Feature Fusion Network for Object Detection in Autonomous Driving
AU - Dong, Chuan Yong
AU - Li, Ying
AU - Du, Hanhan
AU - Fang, Aiqing
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Object detection is essential for autonomous driving, as it provides crucial perception of surrounding environments by locating and classifying semantic targets to support trajectory planning and ensure driving safety. However, traditional methods still exhibit insufficient boundary representation and ineffective multi-scale feature fusion, leading to suboptimal performance in complex scenes. To address these issues, we propose a selective multi-scale feature fusion network built upon a single-stage detection framework. The framework integrates three main components: a Local-Enhanced Global Modeling (LEGM) module that combines convolution and self-attention to strengthen multi-scale feature representation, a Selective Boundary Aggregation (SBA) module that enhances contour information and deep semantics, and a lightweight Transformer-based decoder that adaptively filters queries to improve robustness. This design ensures modular flexibility, training stability, and improved detection accuracy. Experiments on widely used benchmarks demonstrate that the proposed method achieves high localization accuracy while maintaining near real-time inference. Compared with YOLOv13, the mAP@0.5 improves by 4.8%, small-object detection accuracy increases by 12.8%, and performance improves by more than 5% under challenging conditions such as low illumination and occlusion.
AB - Object detection is essential for autonomous driving, as it provides crucial perception of surrounding environments by locating and classifying semantic targets to support trajectory planning and ensure driving safety. However, traditional methods still exhibit insufficient boundary representation and ineffective multi-scale feature fusion, leading to suboptimal performance in complex scenes. To address these issues, we propose a selective multi-scale feature fusion network built upon a single-stage detection framework. The framework integrates three main components: a Local-Enhanced Global Modeling (LEGM) module that combines convolution and self-attention to strengthen multi-scale feature representation, a Selective Boundary Aggregation (SBA) module that enhances contour information and deep semantics, and a lightweight Transformer-based decoder that adaptively filters queries to improve robustness. This design ensures modular flexibility, training stability, and improved detection accuracy. Experiments on widely used benchmarks demonstrate that the proposed method achieves high localization accuracy while maintaining near real-time inference. Compared with YOLOv13, the mAP@0.5 improves by 4.8%, small-object detection accuracy increases by 12.8%, and performance improves by more than 5% under challenging conditions such as low illumination and occlusion.
KW - Deep Learning
KW - Feature Fusion
KW - Object Detection
UR - https://www.scopus.com/pages/publications/105035387985
U2 - 10.1109/ICCVIT67848.2025.11391336
DO - 10.1109/ICCVIT67848.2025.11391336
M3 - 会议稿件
AN - SCOPUS:105035387985
T3 - 2025 3rd International Conference on Computer, Vision and Intelligent Technology, ICCVIT 2025 - Proceedings
BT - 2025 3rd International Conference on Computer, Vision and Intelligent Technology, ICCVIT 2025 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 3rd International Conference on Computer, Vision and Intelligent Technology, ICCVIT 2025
Y2 - 31 October 2025 through 2 November 2025
ER -