SF-Former: Feature-enhanced network with transformer for Pedestrian Detection

Pengyao Zhou, Xin Ning, Meibo Lv, Lei Zhang, Buhong Zhang, Zhiwen Wen

Research output: Contribution to journalArticlepeer-review

Abstract

The issue of crowdedness caused by overlap among similar objects represents a significant challenge in the field of two-dimensional visual object detection. However, the adoption end-to-end and binary classification approaches have resulted in existing DETR-based detectors being heavily reliant on positional encoding. To address these issues, we propose a feature enhancement network based on positional encoding correction of overlapping regions. First, considering the limitations of the encoder in extracting and discriminating overlapping regions, we introduce an innovative non-parametric Fourier transform module (NPFT). The NPFT incorporates edge information into the encoder, improving its ability to identify overlapping and nonoverlapping regions while ensuring accurate positional encoding for overlapping targets. Second, to address the insufficient localisation accuracy for overlapping targets in crowded scenes, we propose the squeeze-and-excitation feedforward network (SFFN). By fusing a positional attention mechanism with self-attention mechanisms, the SFFN enhances the decoder's ability to correct the coordinates of query objects.

Original languageEnglish
JournalIEEE Transactions on Instrumentation and Measurement
DOIs
StateAccepted/In press - 2025

Keywords

  • crowded scenes
  • feature enhancement (FE)
  • fourier transform
  • pedestrian detection
  • Transformer

Fingerprint

Dive into the research topics of 'SF-Former: Feature-enhanced network with transformer for Pedestrian Detection'. Together they form a unique fingerprint.

Cite this