An Overview of Text-Based Person Search: Recent Advances and Future Directions

Kai Niu, Yanyi Liu, Yuzhou Long, Yan Huang, Liang Wang, Yanning Zhang

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

Due to the practical significance in smart video surveillance systems, Text-Based Person Search (TBPS) has been one of the research hotspots recently, which refers to searching for the interested pedestrian images given natural language sentences. To help researchers quickly grasp the developments of this important task, we comprehensively summarize the recent research advances of TBPS from two perspectives, i.e., Feature Extraction (FE) and Semantic Alignments (SA). Specifically, the FE mainly consists of pre-processing approaches and end-to-end frameworks, and the SA could be briefly divided into cross-modal attention mechanism, non-attention alignments, training objectives, and generative approaches. Afterwards, we elaborate four widely-used benchmarks and also the evaluation criterion for TBPS. And comparisons and analyses among the state-of-the-art (SOTA) solutions are provided based on these large-scale benchmarks. At last, we point out some future research directions that need to be further addressed, which will greatly facilitate the practical applications of TBPS.

Original languageEnglish
Pages (from-to)7803-7819
Number of pages17
JournalIEEE Transactions on Circuits and Systems for Video Technology
Volume34
Issue number9
DOIs
StatePublished - 2024

Keywords

  • Text-based person search
  • cross-modal retrieval
  • feature extraction
  • semantic alignments
  • video surveillance

Fingerprint

Dive into the research topics of 'An Overview of Text-Based Person Search: Recent Advances and Future Directions'. Together they form a unique fingerprint.

Cite this