跳到主要导航 跳到搜索 跳到主要内容

Cross-modal Co-occurrence Attributes Alignments for Person Search by Language

  • Kai Niu
  • , Linjiang Huang
  • , Yan Huang
  • , Peng Wang
  • , Liang Wang
  • , Yanning Zhang
  • Northwestern Polytechnical University Xian
  • Chinese University of Hong Kong
  • CAS - Institute of Automation

科研成果: 书/报告/会议事项章节会议稿件同行评审

30 引用 (Scopus)

摘要

Person search by language refers to retrieving the interested pedestrian images based on a free-form natural language description, which has important applications in smart video surveillance. Although great efforts have been made to align images with sentences, the challenge of reporting bias, i.e., attributes are only partially matched across modalities, still incurs large noise and influences the accurate retrieval seriously. To address this challenge, we propose a novel cross-modal matching method named Cross-modal Co-occurrence Attributes Alignments (C2A2), which can better deal with noise and obtain significant improvements in retrieval performance for person search by language. First, we construct visual and textual attribute dictionaries relying on matrix decomposition, and carry out cross-modal alignments using denoising reconstruction features to address the noise from pedestrian-unrelated elements. Second, we re-gather pixels of image and words of sentence under the guidance of learned attribute dictionaries, to adaptively constitute more discriminative co-occurrence attributes in both modalities. And the re-gathered co-occurrence attributes are carefully captured by imposing explicit cross-modal one-to-one alignments which consider relations across modalities, better alleviating the noise from non-correspondence attributes. The whole C_2A_2 method can be trained end-to-end without any pre-processing, i.e., requiring negligible additional computation overheads. It significantly outperforms the existing solutions, and finally achieves the new state-of-the-art retrieval performance on two large-scale benchmarks, CUHK-PEDES and RSTPReid datasets.

源语言英语
主期刊名MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia
出版商Association for Computing Machinery, Inc
4426-4434
页数9
ISBN(电子版)9781450392037
DOI
出版状态已出版 - 10 10月 2022
活动30th ACM International Conference on Multimedia, MM 2022 - Lisboa, 葡萄牙
期限: 10 10月 202214 10月 2022

出版系列

姓名MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia

会议

会议30th ACM International Conference on Multimedia, MM 2022
国家/地区葡萄牙
Lisboa
时期10/10/2214/10/22

指纹

探究 'Cross-modal Co-occurrence Attributes Alignments for Person Search by Language' 的科研主题。它们共同构成独一无二的指纹。

引用此