跳到主要导航 跳到搜索 跳到主要内容

Fine-Granularity Alignment for Text-Based Person Retrieval Via Semantics-Centric Visual Division

  • Zhimin Wei
  • , Zhipeng Zhang
  • , Peng Wu
  • , Ji Wang
  • , Peng Wang
  • , Yanning Zhang
  • Northwestern Polytechnical University Xian

科研成果: 期刊稿件文章同行评审

11 引用 (Scopus)

摘要

Text-based Person Retrieval aims to search the target pedestrian image from video surveillance or a large image database with a text description. Previous works have recognized the significance of mining local information in images and descriptions and performing fine-grained alignment. These approaches adopt hard division or auxiliary networks for locating local visual regions. However, the two existing ways are not flexible enough for various images and may even bring noise. Meanwhile, the Vision-Language Pre-training models like CLIP exhibit strong generalization and zero-shot abilities, which provide an available way to this issue. In this paper, we propose a novel Fine-Granularity Alignment model with Semantics-Centric Visual Division (SCVD). Our method contains a Semantics Deconstructor (SD), a Cross-modal Guided Interaction (CGI) module, and a Dynamic Focus Alignment (DFA) module. The SD aims to extract fine-grained semantic prompts from the raw description which is easy-understand for CLIP. In CGI, we propose a Text-Guided Visual Localization (TVL) module to generate local visual representations according to the semantic prompts and a Vision-Guided Semantics Reconstruction (VSR) module to integrate the prompts into the textual representation. The DFA is used finally to align vision-text fine-grained information. The extensive experiments demonstrate that our proposed framework significantly outperforms current state-of-the-art methods in terms of Rank@1 metric on three benchmarks by an absolute gain of 6.56%, 8.93%, and 11.53%, respectively. Our code is available in https://github.com/tujun233/SCVD.git.

源语言英语
页(从-至)8242-8252
页数11
期刊IEEE Transactions on Circuits and Systems for Video Technology
34
9
DOI
出版状态已出版 - 2024

指纹

探究 'Fine-Granularity Alignment for Text-Based Person Retrieval Via Semantics-Centric Visual Division' 的科研主题。它们共同构成独一无二的指纹。

引用此