TY - JOUR
T1 - SF-Former
T2 - Feature-enhanced network with transformer for Pedestrian Detection
AU - Zhou, Pengyao
AU - Ning, Xin
AU - Lv, Meibo
AU - Zhang, Lei
AU - Zhang, Buhong
AU - Wen, Zhiwen
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - The issue of crowdedness caused by overlap among similar objects represents a significant challenge in the field of two-dimensional visual object detection. However, the adoption end-to-end and binary classification approaches have resulted in existing DETR-based detectors being heavily reliant on positional encoding. To address these issues, we propose a feature enhancement network based on positional encoding correction of overlapping regions. First, considering the limitations of the encoder in extracting and discriminating overlapping regions, we introduce an innovative non-parametric Fourier transform module (NPFT). The NPFT incorporates edge information into the encoder, improving its ability to identify overlapping and nonoverlapping regions while ensuring accurate positional encoding for overlapping targets. Second, to address the insufficient localisation accuracy for overlapping targets in crowded scenes, we propose the squeeze-and-excitation feedforward network (SFFN). By fusing a positional attention mechanism with self-attention mechanisms, the SFFN enhances the decoder's ability to correct the coordinates of query objects.
AB - The issue of crowdedness caused by overlap among similar objects represents a significant challenge in the field of two-dimensional visual object detection. However, the adoption end-to-end and binary classification approaches have resulted in existing DETR-based detectors being heavily reliant on positional encoding. To address these issues, we propose a feature enhancement network based on positional encoding correction of overlapping regions. First, considering the limitations of the encoder in extracting and discriminating overlapping regions, we introduce an innovative non-parametric Fourier transform module (NPFT). The NPFT incorporates edge information into the encoder, improving its ability to identify overlapping and nonoverlapping regions while ensuring accurate positional encoding for overlapping targets. Second, to address the insufficient localisation accuracy for overlapping targets in crowded scenes, we propose the squeeze-and-excitation feedforward network (SFFN). By fusing a positional attention mechanism with self-attention mechanisms, the SFFN enhances the decoder's ability to correct the coordinates of query objects.
KW - crowded scenes
KW - feature enhancement (FE)
KW - fourier transform
KW - pedestrian detection
KW - Transformer
UR - http://www.scopus.com/inward/record.url?scp=85218721839&partnerID=8YFLogxK
U2 - 10.1109/TIM.2025.3544365
DO - 10.1109/TIM.2025.3544365
M3 - 文章
AN - SCOPUS:85218721839
SN - 0018-9456
JO - IEEE Transactions on Instrumentation and Measurement
JF - IEEE Transactions on Instrumentation and Measurement
ER -