Addressing Information Inequality for Text-Based Person Search via Pedestrian-Centric Visual Denoising and Bias-Aware Alignments

Liying Gao; Kai Niu; Bingliang Jiao; Peng Wang; Yanning Zhang

doi:10.1109/TCSVT.2023.3273719

Addressing Information Inequality for Text-Based Person Search via Pedestrian-Centric Visual Denoising and Bias-Aware Alignments

Liying Gao, Kai Niu, Bingliang Jiao, Peng Wang, Yanning Zhang

School of Computer Science

Northwestern Polytechnical University Xian

Research output: Contribution to journal › Article › peer-review

13 Scopus citations

Abstract

Text-based person search is an important task in video surveillance, which aims to retrieve the corresponding pedestrian images with a given description. In this fine-grained retrieval task, accurate cross-modal information matching is an essential yet challenging problem. However, existing methods usually ignore the information inequality between modalities, which could introduce great difficulties to cross-modal matching. Specifically, in this task, the images inevitably contain some pedestrian-irrelevant noise like background and occlusion, and the descriptions could be biased to partial pedestrian content in images. With that in mind, in this paper, we propose a Text-Guided Denoising and Alignment (TGDA) model to alleviate the information inequality and realize effective cross-modal matching. In TGDA, we first design a prototype-based denoising module, which integrates pedestrian knowledge from textual features into a prototype vector and uses it as guidance to filter out pedestrian-irrelevant noise from visual features. Thereafter, a bias-aware alignment module is introduced, which guides our model to focus on the description-biased pedestrian content in cross-modal features consistently. Through extensive experiments, the effectiveness of both modules has been validated. Besides, our TGDA achieves state-of-the-art performance on various related benchmarks.

Original language	English
Pages (from-to)	7884-7899
Number of pages	16
Journal	IEEE Transactions on Circuits and Systems for Video Technology
Volume	33
Issue number	12
DOIs	https://doi.org/10.1109/TCSVT.2023.3273719
State	Published - 1 Dec 2023

Keywords

Text-based person search
information inequality
text-guided denoising and alignment

Access to Document

10.1109/TCSVT.2023.3273719

Cite this

@article{651a6c7d8e214237b86a4280b8d3caf0,

title = "Addressing Information Inequality for Text-Based Person Search via Pedestrian-Centric Visual Denoising and Bias-Aware Alignments",

abstract = "Text-based person search is an important task in video surveillance, which aims to retrieve the corresponding pedestrian images with a given description. In this fine-grained retrieval task, accurate cross-modal information matching is an essential yet challenging problem. However, existing methods usually ignore the information inequality between modalities, which could introduce great difficulties to cross-modal matching. Specifically, in this task, the images inevitably contain some pedestrian-irrelevant noise like background and occlusion, and the descriptions could be biased to partial pedestrian content in images. With that in mind, in this paper, we propose a Text-Guided Denoising and Alignment (TGDA) model to alleviate the information inequality and realize effective cross-modal matching. In TGDA, we first design a prototype-based denoising module, which integrates pedestrian knowledge from textual features into a prototype vector and uses it as guidance to filter out pedestrian-irrelevant noise from visual features. Thereafter, a bias-aware alignment module is introduced, which guides our model to focus on the description-biased pedestrian content in cross-modal features consistently. Through extensive experiments, the effectiveness of both modules has been validated. Besides, our TGDA achieves state-of-the-art performance on various related benchmarks.",

keywords = "Text-based person search, information inequality, text-guided denoising and alignment",

author = "Liying Gao and Kai Niu and Bingliang Jiao and Peng Wang and Yanning Zhang",

note = "Publisher Copyright: {\textcopyright} 1991-2012 IEEE.",

year = "2023",

month = dec,

day = "1",

doi = "10.1109/TCSVT.2023.3273719",

language = "英语",

volume = "33",

pages = "7884--7899",

journal = "IEEE Transactions on Circuits and Systems for Video Technology",

issn = "1051-8215",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "12",

}

TY - JOUR

T1 - Addressing Information Inequality for Text-Based Person Search via Pedestrian-Centric Visual Denoising and Bias-Aware Alignments

AU - Gao, Liying

AU - Niu, Kai

AU - Jiao, Bingliang

AU - Wang, Peng

AU - Zhang, Yanning

PY - 2023/12/1

Y1 - 2023/12/1

N2 - Text-based person search is an important task in video surveillance, which aims to retrieve the corresponding pedestrian images with a given description. In this fine-grained retrieval task, accurate cross-modal information matching is an essential yet challenging problem. However, existing methods usually ignore the information inequality between modalities, which could introduce great difficulties to cross-modal matching. Specifically, in this task, the images inevitably contain some pedestrian-irrelevant noise like background and occlusion, and the descriptions could be biased to partial pedestrian content in images. With that in mind, in this paper, we propose a Text-Guided Denoising and Alignment (TGDA) model to alleviate the information inequality and realize effective cross-modal matching. In TGDA, we first design a prototype-based denoising module, which integrates pedestrian knowledge from textual features into a prototype vector and uses it as guidance to filter out pedestrian-irrelevant noise from visual features. Thereafter, a bias-aware alignment module is introduced, which guides our model to focus on the description-biased pedestrian content in cross-modal features consistently. Through extensive experiments, the effectiveness of both modules has been validated. Besides, our TGDA achieves state-of-the-art performance on various related benchmarks.

AB - Text-based person search is an important task in video surveillance, which aims to retrieve the corresponding pedestrian images with a given description. In this fine-grained retrieval task, accurate cross-modal information matching is an essential yet challenging problem. However, existing methods usually ignore the information inequality between modalities, which could introduce great difficulties to cross-modal matching. Specifically, in this task, the images inevitably contain some pedestrian-irrelevant noise like background and occlusion, and the descriptions could be biased to partial pedestrian content in images. With that in mind, in this paper, we propose a Text-Guided Denoising and Alignment (TGDA) model to alleviate the information inequality and realize effective cross-modal matching. In TGDA, we first design a prototype-based denoising module, which integrates pedestrian knowledge from textual features into a prototype vector and uses it as guidance to filter out pedestrian-irrelevant noise from visual features. Thereafter, a bias-aware alignment module is introduced, which guides our model to focus on the description-biased pedestrian content in cross-modal features consistently. Through extensive experiments, the effectiveness of both modules has been validated. Besides, our TGDA achieves state-of-the-art performance on various related benchmarks.

KW - Text-based person search

KW - information inequality

KW - text-guided denoising and alignment

UR - http://www.scopus.com/inward/record.url?scp=85159846411&partnerID=8YFLogxK

U2 - 10.1109/TCSVT.2023.3273719

DO - 10.1109/TCSVT.2023.3273719

M3 - 文章

AN - SCOPUS:85159846411

SN - 1051-8215

VL - 33

SP - 7884

EP - 7899

JO - IEEE Transactions on Circuits and Systems for Video Technology

JF - IEEE Transactions on Circuits and Systems for Video Technology

IS - 12

ER -

Addressing Information Inequality for Text-Based Person Search via Pedestrian-Centric Visual Denoising and Bias-Aware Alignments

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this