Addressing Information Inequality for Text-Based Person Search via Pedestrian-Centric Visual Denoising and Bias-Aware Alignments

Liying Gao, Kai Niu, Bingliang Jiao, Peng Wang, Yanning Zhang

Research output: Contribution to journalArticlepeer-review

13 Scopus citations

Abstract

Text-based person search is an important task in video surveillance, which aims to retrieve the corresponding pedestrian images with a given description. In this fine-grained retrieval task, accurate cross-modal information matching is an essential yet challenging problem. However, existing methods usually ignore the information inequality between modalities, which could introduce great difficulties to cross-modal matching. Specifically, in this task, the images inevitably contain some pedestrian-irrelevant noise like background and occlusion, and the descriptions could be biased to partial pedestrian content in images. With that in mind, in this paper, we propose a Text-Guided Denoising and Alignment (TGDA) model to alleviate the information inequality and realize effective cross-modal matching. In TGDA, we first design a prototype-based denoising module, which integrates pedestrian knowledge from textual features into a prototype vector and uses it as guidance to filter out pedestrian-irrelevant noise from visual features. Thereafter, a bias-aware alignment module is introduced, which guides our model to focus on the description-biased pedestrian content in cross-modal features consistently. Through extensive experiments, the effectiveness of both modules has been validated. Besides, our TGDA achieves state-of-the-art performance on various related benchmarks.

Original languageEnglish
Pages (from-to)7884-7899
Number of pages16
JournalIEEE Transactions on Circuits and Systems for Video Technology
Volume33
Issue number12
DOIs
StatePublished - 1 Dec 2023

Keywords

  • Text-based person search
  • information inequality
  • text-guided denoising and alignment

Fingerprint

Dive into the research topics of 'Addressing Information Inequality for Text-Based Person Search via Pedestrian-Centric Visual Denoising and Bias-Aware Alignments'. Together they form a unique fingerprint.

Cite this