Text-based Person Search in Full Images via Semantic-Driven Proposal Generation

Shizhou Zhang; De Cheng; Wenlong Luo; Yinghui Xing; Duo Long; Hao Li; Kai Niu; Guoqiang Liang; Yanning Zhang

doi:10.1145/3606041.3618058

Text-based Person Search in Full Images via Semantic-Driven Proposal Generation

Shizhou Zhang, De Cheng, Wenlong Luo, Yinghui Xing, Duo Long, Hao Li, Kai Niu, Guoqiang Liang, Yanning Zhang

School of Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

5 Scopus citations

Abstract

Finding target persons in full scene images with a query of text description has important practical applications in intelligent video surveillance. However, different from the real-world scenarios where the bounding boxes are not available, existing text-based person re- trieval methods mainly focus on the cross modal matching between the query text descriptions and the gallery of cropped pedestrian images. To close the gap, we study the problem of text-based person search in full images by proposing a new end-to-end learning framework which jointly optimize the pedestrian detection, identification and visual-semantic feature embedding tasks. To take full advantage of the query text, the semantic features are leveraged to instruct the Region Proposal Network to pay more attention to the text-described proposals. Besides, a cross-scale visual-semantic embedding mechanism is utilized to improve the performance. To validate the proposed method, we collect and annotate two large-scale benchmark datasets based on the widely adopted image-based person search datasets CUHK-SYSU and PRW. Comprehensive experiments are conducted on the two datasets and compared with the baseline methods, our method achieves the state-of-the-art performance.

Original language	English
Title of host publication	HCMA 2023 - Proceedings of the 4th International Workshop on Human-centric Multimedia Analysis, Co-located with
Subtitle of host publication	MM 2023
Publisher	Association for Computing Machinery, Inc
Pages	5-14
Number of pages	10
ISBN (Electronic)	9798400702723
DOIs	https://doi.org/10.1145/3606041.3618058
State	Published - 2 Nov 2023
Event	4th International Workshop on Human-centric Multimedia Analysis, HCMA 2023 - Ottawa, Canada Duration: 2 Nov 2023 → …

Publication series

Name	HCMA 2023 - Proceedings of the 4th International Workshop on Human-centric Multimedia Analysis, Co-located with: MM 2023

Conference

Conference	4th International Workshop on Human-centric Multimedia Analysis, HCMA 2023
Country/Territory	Canada
City	Ottawa
Period	2/11/23 → …

Keywords

cross scale alignment.
semantic-driven rpn
text-based person search

Access to Document

10.1145/3606041.3618058

Cite this

Zhang, S., Cheng, D., Luo, W., Xing, Y., Long, D., Li, H., Niu, K., Liang, G., & Zhang, Y. (2023). Text-based Person Search in Full Images via Semantic-Driven Proposal Generation. In HCMA 2023 - Proceedings of the 4th International Workshop on Human-centric Multimedia Analysis, Co-located with: MM 2023 (pp. 5-14). (HCMA 2023 - Proceedings of the 4th International Workshop on Human-centric Multimedia Analysis, Co-located with: MM 2023). Association for Computing Machinery, Inc. https://doi.org/10.1145/3606041.3618058

Zhang, Shizhou ; Cheng, De ; Luo, Wenlong et al. / Text-based Person Search in Full Images via Semantic-Driven Proposal Generation. HCMA 2023 - Proceedings of the 4th International Workshop on Human-centric Multimedia Analysis, Co-located with: MM 2023. Association for Computing Machinery, Inc, 2023. pp. 5-14 (HCMA 2023 - Proceedings of the 4th International Workshop on Human-centric Multimedia Analysis, Co-located with: MM 2023).

@inproceedings{f3e216785a6f4e599d75d447e18d1b78,

title = "Text-based Person Search in Full Images via Semantic-Driven Proposal Generation",

abstract = "Finding target persons in full scene images with a query of text description has important practical applications in intelligent video surveillance. However, different from the real-world scenarios where the bounding boxes are not available, existing text-based person re- trieval methods mainly focus on the cross modal matching between the query text descriptions and the gallery of cropped pedestrian images. To close the gap, we study the problem of text-based person search in full images by proposing a new end-to-end learning framework which jointly optimize the pedestrian detection, identification and visual-semantic feature embedding tasks. To take full advantage of the query text, the semantic features are leveraged to instruct the Region Proposal Network to pay more attention to the text-described proposals. Besides, a cross-scale visual-semantic embedding mechanism is utilized to improve the performance. To validate the proposed method, we collect and annotate two large-scale benchmark datasets based on the widely adopted image-based person search datasets CUHK-SYSU and PRW. Comprehensive experiments are conducted on the two datasets and compared with the baseline methods, our method achieves the state-of-the-art performance.",

keywords = "cross scale alignment., semantic-driven rpn, text-based person search",

author = "Shizhou Zhang and De Cheng and Wenlong Luo and Yinghui Xing and Duo Long and Hao Li and Kai Niu and Guoqiang Liang and Yanning Zhang",

note = "Publisher Copyright: {\textcopyright} 2023 ACM.; 4th International Workshop on Human-centric Multimedia Analysis, HCMA 2023 ; Conference date: 02-11-2023",

year = "2023",

month = nov,

day = "2",

doi = "10.1145/3606041.3618058",

language = "英语",

series = "HCMA 2023 - Proceedings of the 4th International Workshop on Human-centric Multimedia Analysis, Co-located with: MM 2023",

publisher = "Association for Computing Machinery, Inc",

pages = "5--14",

booktitle = "HCMA 2023 - Proceedings of the 4th International Workshop on Human-centric Multimedia Analysis, Co-located with",

}

Zhang, S, Cheng, D, Luo, W, Xing, Y, Long, D, Li, H, Niu, K, Liang, G & Zhang, Y 2023, Text-based Person Search in Full Images via Semantic-Driven Proposal Generation. in HCMA 2023 - Proceedings of the 4th International Workshop on Human-centric Multimedia Analysis, Co-located with: MM 2023. HCMA 2023 - Proceedings of the 4th International Workshop on Human-centric Multimedia Analysis, Co-located with: MM 2023, Association for Computing Machinery, Inc, pp. 5-14, 4th International Workshop on Human-centric Multimedia Analysis, HCMA 2023, Ottawa, Canada, 2/11/23. https://doi.org/10.1145/3606041.3618058

Text-based Person Search in Full Images via Semantic-Driven Proposal Generation. / Zhang, Shizhou; Cheng, De; Luo, Wenlong et al.
HCMA 2023 - Proceedings of the 4th International Workshop on Human-centric Multimedia Analysis, Co-located with: MM 2023. Association for Computing Machinery, Inc, 2023. p. 5-14 (HCMA 2023 - Proceedings of the 4th International Workshop on Human-centric Multimedia Analysis, Co-located with: MM 2023).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Text-based Person Search in Full Images via Semantic-Driven Proposal Generation

AU - Zhang, Shizhou

AU - Cheng, De

AU - Luo, Wenlong

AU - Xing, Yinghui

AU - Long, Duo

AU - Li, Hao

AU - Niu, Kai

AU - Liang, Guoqiang

AU - Zhang, Yanning

PY - 2023/11/2

Y1 - 2023/11/2

N2 - Finding target persons in full scene images with a query of text description has important practical applications in intelligent video surveillance. However, different from the real-world scenarios where the bounding boxes are not available, existing text-based person re- trieval methods mainly focus on the cross modal matching between the query text descriptions and the gallery of cropped pedestrian images. To close the gap, we study the problem of text-based person search in full images by proposing a new end-to-end learning framework which jointly optimize the pedestrian detection, identification and visual-semantic feature embedding tasks. To take full advantage of the query text, the semantic features are leveraged to instruct the Region Proposal Network to pay more attention to the text-described proposals. Besides, a cross-scale visual-semantic embedding mechanism is utilized to improve the performance. To validate the proposed method, we collect and annotate two large-scale benchmark datasets based on the widely adopted image-based person search datasets CUHK-SYSU and PRW. Comprehensive experiments are conducted on the two datasets and compared with the baseline methods, our method achieves the state-of-the-art performance.

AB - Finding target persons in full scene images with a query of text description has important practical applications in intelligent video surveillance. However, different from the real-world scenarios where the bounding boxes are not available, existing text-based person re- trieval methods mainly focus on the cross modal matching between the query text descriptions and the gallery of cropped pedestrian images. To close the gap, we study the problem of text-based person search in full images by proposing a new end-to-end learning framework which jointly optimize the pedestrian detection, identification and visual-semantic feature embedding tasks. To take full advantage of the query text, the semantic features are leveraged to instruct the Region Proposal Network to pay more attention to the text-described proposals. Besides, a cross-scale visual-semantic embedding mechanism is utilized to improve the performance. To validate the proposed method, we collect and annotate two large-scale benchmark datasets based on the widely adopted image-based person search datasets CUHK-SYSU and PRW. Comprehensive experiments are conducted on the two datasets and compared with the baseline methods, our method achieves the state-of-the-art performance.

KW - cross scale alignment.

KW - semantic-driven rpn

KW - text-based person search

UR - http://www.scopus.com/inward/record.url?scp=85178584785&partnerID=8YFLogxK

U2 - 10.1145/3606041.3618058

DO - 10.1145/3606041.3618058

M3 - 会议稿件

AN - SCOPUS:85178584785

T3 - HCMA 2023 - Proceedings of the 4th International Workshop on Human-centric Multimedia Analysis, Co-located with: MM 2023

SP - 5

EP - 14

BT - HCMA 2023 - Proceedings of the 4th International Workshop on Human-centric Multimedia Analysis, Co-located with

PB - Association for Computing Machinery, Inc

T2 - 4th International Workshop on Human-centric Multimedia Analysis, HCMA 2023

Y2 - 2 November 2023

ER -

Zhang S, Cheng D, Luo W, Xing Y, Long D, Li H et al. Text-based Person Search in Full Images via Semantic-Driven Proposal Generation. In HCMA 2023 - Proceedings of the 4th International Workshop on Human-centric Multimedia Analysis, Co-located with: MM 2023. Association for Computing Machinery, Inc. 2023. p. 5-14. (HCMA 2023 - Proceedings of the 4th International Workshop on Human-centric Multimedia Analysis, Co-located with: MM 2023). doi: 10.1145/3606041.3618058

Text-based Person Search in Full Images via Semantic-Driven Proposal Generation

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this