SF-Former: Feature-enhanced network with transformer for Pedestrian Detection

Pengyao Zhou, Xin Ning, Meibo Lv, Lei Zhang, Buhong Zhang, Zhiwen Wen

科研成果: 期刊稿件文章同行评审

摘要

The issue of crowdedness caused by overlap among similar objects represents a significant challenge in the field of two-dimensional visual object detection. However, the adoption end-to-end and binary classification approaches have resulted in existing DETR-based detectors being heavily reliant on positional encoding. To address these issues, we propose a feature enhancement network based on positional encoding correction of overlapping regions. First, considering the limitations of the encoder in extracting and discriminating overlapping regions, we introduce an innovative non-parametric Fourier transform module (NPFT). The NPFT incorporates edge information into the encoder, improving its ability to identify overlapping and nonoverlapping regions while ensuring accurate positional encoding for overlapping targets. Second, to address the insufficient localisation accuracy for overlapping targets in crowded scenes, we propose the squeeze-and-excitation feedforward network (SFFN). By fusing a positional attention mechanism with self-attention mechanisms, the SFFN enhances the decoder's ability to correct the coordinates of query objects.

指纹

探究 'SF-Former: Feature-enhanced network with transformer for Pedestrian Detection' 的科研主题。它们共同构成独一无二的指纹。

引用此