SF-Former: Feature-Enhanced Network With Transformer for Pedestrian Detection

Pengyao Zhou, Xin Ning, Meibo Lv, Lei Zhang, Buhong Zhang, Zhiwen Wen

Research output: Contribution to journalArticlepeer-review

Abstract

The issue of crowdedness caused by overlap among similar objects represents a significant challenge in the field of 2-D visual object detection. However, the adoption end-to-end and binary classification approaches have resulted in existing DETR-based detectors being heavily reliant on positional encoding. To address these issues, we propose a feature enhancement (FE) network based on positional encoding correction of overlapping regions. First, considering the limitations of the encoder in extracting and discriminating overlapping regions, we introduce an innovative nonparametric Fourier transform (NPFT) module. The NPFT incorporates edge information into the encoder, improving its ability to identify overlapping and nonoverlapping regions while ensuring accurate positional encoding for overlapping targets. Second, to address the insufficient localization accuracy for overlapping targets in crowded scenes, we propose the squeeze-and-excitation feedforward network (SFFN). By fusing a positional attention mechanism with self attention mechanisms, the SFFN enhances the decoder’s ability to correct the coordinates of query objects.

Original languageEnglish
Article number5012910
JournalIEEE Transactions on Instrumentation and Measurement
Volume74
DOIs
StatePublished - 2025

Keywords

  • Crowded scenes
  • feature enhancement (FE)
  • Fourier transform
  • pedestrian detection
  • transformer

Fingerprint

Dive into the research topics of 'SF-Former: Feature-Enhanced Network With Transformer for Pedestrian Detection'. Together they form a unique fingerprint.

Cite this