Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach

Shizhou Zhang; Wenlong Luo; De Cheng; Qingchun Yang; Lingyan Ran; Yinghui Xing; Yanning Zhang

doi:10.1007/978-3-031-73383-3_16

Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach

Shizhou Zhang, Wenlong Luo, De Cheng, Qingchun Yang, Lingyan Ran, Yinghui Xing, Yanning Zhang

计算机学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

摘要

In this paper, we construct a large-scale benchmark dataset for Ground-to-Aerial Video-based person Re-Identification, named G2A-VReID, which comprises 185,907 images and 5,576 tracklets, featuring 2,788 distinct identities. To our knowledge, this is the first dataset for video ReID under Ground-to-Aerial scenarios. G2A-VReID dataset has the following characteristics: 1) Drastic view changes; 2) Large number of annotated identities; 3) Rich outdoor scenarios; 4) Huge difference in resolution. Additionally, we propose a new benchmark approach for cross-platform ReID by transforming the cross-platform visual alignment problem into visual-semantic alignment through vision-language model (i.e., CLIP) and applying a parameter-efficient Video Set-Level-Adapter module to adapt image-based foundation model to video ReID tasks, termed VSLA-CLIP. Besides, to further reduce the great discrepancy across the platforms, we also devise the platform-bridge prompts for efficient visual feature alignment. Extensive experiments demonstrate the superiority of the proposed method on all existing video ReID datasets and our proposed G2A-VReID dataset. The code and datasets are available at https://github.com/FHR-L/VSLA-CLIP.

源语言	英语
主期刊名	Computer Vision – ECCV 2024 - 18th European Conference, Proceedings
编辑	Aleš Leonardis, Elisa Ricci, Stefan Roth, Olga Russakovsky, Torsten Sattler, Gül Varol
出版商	Springer Science and Business Media Deutschland GmbH
页	270-287
页数	18
ISBN（印刷版）	9783031733826
DOI	https://doi.org/10.1007/978-3-031-73383-3_16
出版状态	已出版 - 2025
活动	18th European Conference on Computer Vision, ECCV 2024 - Milan, 意大利期限: 29 9月 2024 → 4 10月 2024

出版系列

姓名	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
卷	15085 LNCS
ISSN（印刷版）	0302-9743
ISSN（电子版）	1611-3349

会议

会议	18th European Conference on Computer Vision, ECCV 2024
国家/地区	意大利
市	Milan
时期	29/09/24 → 4/10/24

访问文件

10.1007/978-3-031-73383-3_16

其它文件与链接

链接到 Scopus 的出版物

引用此

Zhang, S., Luo, W., Cheng, D., Yang, Q., Ran, L., Xing, Y., & Zhang, Y. (2025). Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach. 在 A. Leonardis, E. Ricci, S. Roth, O. Russakovsky, T. Sattler, & G. Varol (编辑), Computer Vision – ECCV 2024 - 18th European Conference, Proceedings (页码 270-287). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 卷 15085 LNCS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-73383-3_16

Zhang, Shizhou ; Luo, Wenlong ; Cheng, De 等. / Cross-Platform Video Person ReID : A New Benchmark Dataset and Adaptation Approach. Computer Vision – ECCV 2024 - 18th European Conference, Proceedings. 编辑 / Aleš Leonardis ; Elisa Ricci ; Stefan Roth ; Olga Russakovsky ; Torsten Sattler ; Gül Varol. Springer Science and Business Media Deutschland GmbH, 2025. 页码 270-287 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{4a1f35b590b44b399b75dbaab0eb54ba,

title = "Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach",

abstract = "In this paper, we construct a large-scale benchmark dataset for Ground-to-Aerial Video-based person Re-Identification, named G2A-VReID, which comprises 185,907 images and 5,576 tracklets, featuring 2,788 distinct identities. To our knowledge, this is the first dataset for video ReID under Ground-to-Aerial scenarios. G2A-VReID dataset has the following characteristics: 1) Drastic view changes; 2) Large number of annotated identities; 3) Rich outdoor scenarios; 4) Huge difference in resolution. Additionally, we propose a new benchmark approach for cross-platform ReID by transforming the cross-platform visual alignment problem into visual-semantic alignment through vision-language model (i.e., CLIP) and applying a parameter-efficient Video Set-Level-Adapter module to adapt image-based foundation model to video ReID tasks, termed VSLA-CLIP. Besides, to further reduce the great discrepancy across the platforms, we also devise the platform-bridge prompts for efficient visual feature alignment. Extensive experiments demonstrate the superiority of the proposed method on all existing video ReID datasets and our proposed G2A-VReID dataset. The code and datasets are available at https://github.com/FHR-L/VSLA-CLIP.",

keywords = "Dataset, Ground-to-Aerial, Person Re-Identification",

author = "Shizhou Zhang and Wenlong Luo and De Cheng and Qingchun Yang and Lingyan Ran and Yinghui Xing and Yanning Zhang",

note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.; 18th European Conference on Computer Vision, ECCV 2024 ; Conference date: 29-09-2024 Through 04-10-2024",

year = "2025",

doi = "10.1007/978-3-031-73383-3_16",

language = "英语",

isbn = "9783031733826",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "270--287",

editor = "Ale{\v s} Leonardis and Elisa Ricci and Stefan Roth and Olga Russakovsky and Torsten Sattler and G{\"u}l Varol",

booktitle = "Computer Vision – ECCV 2024 - 18th European Conference, Proceedings",

}

Zhang, S, Luo, W, Cheng, D, Yang, Q, Ran, L, Xing, Y & Zhang, Y 2025, Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach. 在 A Leonardis, E Ricci, S Roth, O Russakovsky, T Sattler & G Varol (编辑), Computer Vision – ECCV 2024 - 18th European Conference, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 卷 15085 LNCS, Springer Science and Business Media Deutschland GmbH, 页码 270-287, 18th European Conference on Computer Vision, ECCV 2024, Milan, 意大利, 29/09/24. https://doi.org/10.1007/978-3-031-73383-3_16

Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach. / Zhang, Shizhou; Luo, Wenlong; Cheng, De 等.
Computer Vision – ECCV 2024 - 18th European Conference, Proceedings. 编辑 / Aleš Leonardis; Elisa Ricci; Stefan Roth; Olga Russakovsky; Torsten Sattler; Gül Varol. Springer Science and Business Media Deutschland GmbH, 2025. 页码 270-287 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 卷 15085 LNCS).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Cross-Platform Video Person ReID

T2 - 18th European Conference on Computer Vision, ECCV 2024

AU - Zhang, Shizhou

AU - Luo, Wenlong

AU - Cheng, De

AU - Yang, Qingchun

AU - Ran, Lingyan

AU - Xing, Yinghui

AU - Zhang, Yanning

N1 - Publisher Copyright: © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

PY - 2025

Y1 - 2025

N2 - In this paper, we construct a large-scale benchmark dataset for Ground-to-Aerial Video-based person Re-Identification, named G2A-VReID, which comprises 185,907 images and 5,576 tracklets, featuring 2,788 distinct identities. To our knowledge, this is the first dataset for video ReID under Ground-to-Aerial scenarios. G2A-VReID dataset has the following characteristics: 1) Drastic view changes; 2) Large number of annotated identities; 3) Rich outdoor scenarios; 4) Huge difference in resolution. Additionally, we propose a new benchmark approach for cross-platform ReID by transforming the cross-platform visual alignment problem into visual-semantic alignment through vision-language model (i.e., CLIP) and applying a parameter-efficient Video Set-Level-Adapter module to adapt image-based foundation model to video ReID tasks, termed VSLA-CLIP. Besides, to further reduce the great discrepancy across the platforms, we also devise the platform-bridge prompts for efficient visual feature alignment. Extensive experiments demonstrate the superiority of the proposed method on all existing video ReID datasets and our proposed G2A-VReID dataset. The code and datasets are available at https://github.com/FHR-L/VSLA-CLIP.

AB - In this paper, we construct a large-scale benchmark dataset for Ground-to-Aerial Video-based person Re-Identification, named G2A-VReID, which comprises 185,907 images and 5,576 tracklets, featuring 2,788 distinct identities. To our knowledge, this is the first dataset for video ReID under Ground-to-Aerial scenarios. G2A-VReID dataset has the following characteristics: 1) Drastic view changes; 2) Large number of annotated identities; 3) Rich outdoor scenarios; 4) Huge difference in resolution. Additionally, we propose a new benchmark approach for cross-platform ReID by transforming the cross-platform visual alignment problem into visual-semantic alignment through vision-language model (i.e., CLIP) and applying a parameter-efficient Video Set-Level-Adapter module to adapt image-based foundation model to video ReID tasks, termed VSLA-CLIP. Besides, to further reduce the great discrepancy across the platforms, we also devise the platform-bridge prompts for efficient visual feature alignment. Extensive experiments demonstrate the superiority of the proposed method on all existing video ReID datasets and our proposed G2A-VReID dataset. The code and datasets are available at https://github.com/FHR-L/VSLA-CLIP.

KW - Dataset

KW - Ground-to-Aerial

KW - Person Re-Identification

UR - http://www.scopus.com/inward/record.url?scp=85209409569&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-73383-3_16

DO - 10.1007/978-3-031-73383-3_16

M3 - 会议稿件

AN - SCOPUS:85209409569

SN - 9783031733826

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 270

EP - 287

BT - Computer Vision – ECCV 2024 - 18th European Conference, Proceedings

A2 - Leonardis, Aleš

A2 - Ricci, Elisa

A2 - Roth, Stefan

A2 - Russakovsky, Olga

A2 - Sattler, Torsten

A2 - Varol, Gül

PB - Springer Science and Business Media Deutschland GmbH

Y2 - 29 September 2024 through 4 October 2024

ER -

Zhang S, Luo W, Cheng D, Yang Q, Ran L, Xing Y 等. Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach. 在 Leonardis A, Ricci E, Roth S, Russakovsky O, Sattler T, Varol G, 编辑, Computer Vision – ECCV 2024 - 18th European Conference, Proceedings. Springer Science and Business Media Deutschland GmbH. 2025. 页码 270-287. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-031-73383-3_16

Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此