Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach

Shizhou Zhang; Wenlong Luo; De Cheng; Qingchun Yang; Lingyan Ran; Yinghui Xing; Yanning Zhang

doi:10.1007/978-3-031-73383-3_16

Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach

Shizhou Zhang, Wenlong Luo, De Cheng, Qingchun Yang, Lingyan Ran, Yinghui Xing, Yanning Zhang

School of Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

In this paper, we construct a large-scale benchmark dataset for Ground-to-Aerial Video-based person Re-Identification, named G2A-VReID, which comprises 185,907 images and 5,576 tracklets, featuring 2,788 distinct identities. To our knowledge, this is the first dataset for video ReID under Ground-to-Aerial scenarios. G2A-VReID dataset has the following characteristics: 1) Drastic view changes; 2) Large number of annotated identities; 3) Rich outdoor scenarios; 4) Huge difference in resolution. Additionally, we propose a new benchmark approach for cross-platform ReID by transforming the cross-platform visual alignment problem into visual-semantic alignment through vision-language model (i.e., CLIP) and applying a parameter-efficient Video Set-Level-Adapter module to adapt image-based foundation model to video ReID tasks, termed VSLA-CLIP. Besides, to further reduce the great discrepancy across the platforms, we also devise the platform-bridge prompts for efficient visual feature alignment. Extensive experiments demonstrate the superiority of the proposed method on all existing video ReID datasets and our proposed G2A-VReID dataset. The code and datasets are available at https://github.com/FHR-L/VSLA-CLIP.

Original language	English
Title of host publication	Computer Vision – ECCV 2024 - 18th European Conference, Proceedings
Editors	Aleš Leonardis, Elisa Ricci, Stefan Roth, Olga Russakovsky, Torsten Sattler, Gül Varol
Publisher	Springer Science and Business Media Deutschland GmbH
Pages	270-287
Number of pages	18
ISBN (Print)	9783031733826
DOIs	https://doi.org/10.1007/978-3-031-73383-3_16
State	Published - 2025
Event	18th European Conference on Computer Vision, ECCV 2024 - Milan, Italy Duration: 29 Sep 2024 → 4 Oct 2024

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	15085 LNCS
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	18th European Conference on Computer Vision, ECCV 2024
Country/Territory	Italy
City	Milan
Period	29/09/24 → 4/10/24

Keywords

Dataset
Ground-to-Aerial
Person Re-Identification

Access to Document

10.1007/978-3-031-73383-3_16

Cite this

Zhang, S., Luo, W., Cheng, D., Yang, Q., Ran, L., Xing, Y., & Zhang, Y. (2025). Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach. In A. Leonardis, E. Ricci, S. Roth, O. Russakovsky, T. Sattler, & G. Varol (Eds.), Computer Vision – ECCV 2024 - 18th European Conference, Proceedings (pp. 270-287). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 15085 LNCS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-73383-3_16

Zhang, Shizhou ; Luo, Wenlong ; Cheng, De et al. / Cross-Platform Video Person ReID : A New Benchmark Dataset and Adaptation Approach. Computer Vision – ECCV 2024 - 18th European Conference, Proceedings. editor / Aleš Leonardis ; Elisa Ricci ; Stefan Roth ; Olga Russakovsky ; Torsten Sattler ; Gül Varol. Springer Science and Business Media Deutschland GmbH, 2025. pp. 270-287 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{4a1f35b590b44b399b75dbaab0eb54ba,

title = "Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach",

abstract = "In this paper, we construct a large-scale benchmark dataset for Ground-to-Aerial Video-based person Re-Identification, named G2A-VReID, which comprises 185,907 images and 5,576 tracklets, featuring 2,788 distinct identities. To our knowledge, this is the first dataset for video ReID under Ground-to-Aerial scenarios. G2A-VReID dataset has the following characteristics: 1) Drastic view changes; 2) Large number of annotated identities; 3) Rich outdoor scenarios; 4) Huge difference in resolution. Additionally, we propose a new benchmark approach for cross-platform ReID by transforming the cross-platform visual alignment problem into visual-semantic alignment through vision-language model (i.e., CLIP) and applying a parameter-efficient Video Set-Level-Adapter module to adapt image-based foundation model to video ReID tasks, termed VSLA-CLIP. Besides, to further reduce the great discrepancy across the platforms, we also devise the platform-bridge prompts for efficient visual feature alignment. Extensive experiments demonstrate the superiority of the proposed method on all existing video ReID datasets and our proposed G2A-VReID dataset. The code and datasets are available at https://github.com/FHR-L/VSLA-CLIP.",

keywords = "Dataset, Ground-to-Aerial, Person Re-Identification",

author = "Shizhou Zhang and Wenlong Luo and De Cheng and Qingchun Yang and Lingyan Ran and Yinghui Xing and Yanning Zhang",

note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.; 18th European Conference on Computer Vision, ECCV 2024 ; Conference date: 29-09-2024 Through 04-10-2024",

year = "2025",

doi = "10.1007/978-3-031-73383-3_16",

language = "英语",

isbn = "9783031733826",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "270--287",

editor = "Ale{\v s} Leonardis and Elisa Ricci and Stefan Roth and Olga Russakovsky and Torsten Sattler and G{\"u}l Varol",

booktitle = "Computer Vision – ECCV 2024 - 18th European Conference, Proceedings",

}

Zhang, S, Luo, W, Cheng, D, Yang, Q, Ran, L, Xing, Y & Zhang, Y 2025, Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach. in A Leonardis, E Ricci, S Roth, O Russakovsky, T Sattler & G Varol (eds), Computer Vision – ECCV 2024 - 18th European Conference, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 15085 LNCS, Springer Science and Business Media Deutschland GmbH, pp. 270-287, 18th European Conference on Computer Vision, ECCV 2024, Milan, Italy, 29/09/24. https://doi.org/10.1007/978-3-031-73383-3_16

Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach. / Zhang, Shizhou; Luo, Wenlong; Cheng, De et al.
Computer Vision – ECCV 2024 - 18th European Conference, Proceedings. ed. / Aleš Leonardis; Elisa Ricci; Stefan Roth; Olga Russakovsky; Torsten Sattler; Gül Varol. Springer Science and Business Media Deutschland GmbH, 2025. p. 270-287 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 15085 LNCS).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Cross-Platform Video Person ReID

T2 - 18th European Conference on Computer Vision, ECCV 2024

AU - Zhang, Shizhou

AU - Luo, Wenlong

AU - Cheng, De

AU - Yang, Qingchun

AU - Ran, Lingyan

AU - Xing, Yinghui

AU - Zhang, Yanning

N1 - Publisher Copyright: © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

PY - 2025

Y1 - 2025

N2 - In this paper, we construct a large-scale benchmark dataset for Ground-to-Aerial Video-based person Re-Identification, named G2A-VReID, which comprises 185,907 images and 5,576 tracklets, featuring 2,788 distinct identities. To our knowledge, this is the first dataset for video ReID under Ground-to-Aerial scenarios. G2A-VReID dataset has the following characteristics: 1) Drastic view changes; 2) Large number of annotated identities; 3) Rich outdoor scenarios; 4) Huge difference in resolution. Additionally, we propose a new benchmark approach for cross-platform ReID by transforming the cross-platform visual alignment problem into visual-semantic alignment through vision-language model (i.e., CLIP) and applying a parameter-efficient Video Set-Level-Adapter module to adapt image-based foundation model to video ReID tasks, termed VSLA-CLIP. Besides, to further reduce the great discrepancy across the platforms, we also devise the platform-bridge prompts for efficient visual feature alignment. Extensive experiments demonstrate the superiority of the proposed method on all existing video ReID datasets and our proposed G2A-VReID dataset. The code and datasets are available at https://github.com/FHR-L/VSLA-CLIP.

AB - In this paper, we construct a large-scale benchmark dataset for Ground-to-Aerial Video-based person Re-Identification, named G2A-VReID, which comprises 185,907 images and 5,576 tracklets, featuring 2,788 distinct identities. To our knowledge, this is the first dataset for video ReID under Ground-to-Aerial scenarios. G2A-VReID dataset has the following characteristics: 1) Drastic view changes; 2) Large number of annotated identities; 3) Rich outdoor scenarios; 4) Huge difference in resolution. Additionally, we propose a new benchmark approach for cross-platform ReID by transforming the cross-platform visual alignment problem into visual-semantic alignment through vision-language model (i.e., CLIP) and applying a parameter-efficient Video Set-Level-Adapter module to adapt image-based foundation model to video ReID tasks, termed VSLA-CLIP. Besides, to further reduce the great discrepancy across the platforms, we also devise the platform-bridge prompts for efficient visual feature alignment. Extensive experiments demonstrate the superiority of the proposed method on all existing video ReID datasets and our proposed G2A-VReID dataset. The code and datasets are available at https://github.com/FHR-L/VSLA-CLIP.

KW - Dataset

KW - Ground-to-Aerial

KW - Person Re-Identification

UR - http://www.scopus.com/inward/record.url?scp=85209409569&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-73383-3_16

DO - 10.1007/978-3-031-73383-3_16

M3 - 会议稿件

AN - SCOPUS:85209409569

SN - 9783031733826

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 270

EP - 287

BT - Computer Vision – ECCV 2024 - 18th European Conference, Proceedings

A2 - Leonardis, Aleš

A2 - Ricci, Elisa

A2 - Roth, Stefan

A2 - Russakovsky, Olga

A2 - Sattler, Torsten

A2 - Varol, Gül

PB - Springer Science and Business Media Deutschland GmbH

Y2 - 29 September 2024 through 4 October 2024

ER -

Zhang S, Luo W, Cheng D, Yang Q, Ran L, Xing Y et al. Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach. In Leonardis A, Ricci E, Roth S, Russakovsky O, Sattler T, Varol G, editors, Computer Vision – ECCV 2024 - 18th European Conference, Proceedings. Springer Science and Business Media Deutschland GmbH. 2025. p. 270-287. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-031-73383-3_16