Intermediary-Guided Bidirectional Spatial-Temporal Aggregation Network for Video-Based Visible-Infrared Person Re-Identification

Huafeng Li; Minghui Liu; Zhanxuan Hu; Feiping Nie; Zhengtao Yu

doi:10.1109/TCSVT.2023.3246091

Intermediary-Guided Bidirectional Spatial-Temporal Aggregation Network for Video-Based Visible-Infrared Person Re-Identification

Huafeng Li, Minghui Liu, Zhanxuan Hu, Feiping Nie, Zhengtao Yu

School of Artificial Intelligence, OPtics and Electronics

Research output: Contribution to journal › Article › peer-review

25 Scopus citations

Abstract

This work focuses on the task of Video-based Visible-Infrared Person Re-Identification, a promising technique for achieving 24-hour surveillance systems. Two main issues in this field are modality discrepancy mitigating and spatial-temporal information mining. In this work, we propose a novel method, named Intermediary-guided Bidirectional spatial-temporal Aggregation Network (IBAN), to address both issues at once. Specifically, IBAN is designed to learn modality-irrelevant features by leveraging the anaglyph data of pedestrian images to serve as the intermediary. Furthermore, a bidirectional spatial-temporal aggregation module is introduced to exploit the spatial-temporal information of video data, while mitigating the impact of noisy image frames. Finally, we design an Easy-sample-based loss to guide the final embedding space and further improve the model's generalization performance. Extensive experiments on Video-based Visible-Infrared benchmarks show that IBAN achieves promising results and outperforms the state-of-the-art ReID methods by a large margin, improving the rank-1/mAP by 1.29%/3.46% at the Infrared to Visible situation, and by 5.04%/3.27% at the Visible to Infrared situation. The source code of the proposed method will be released at https://github.com/lhf12278/IBAN.

Original language	English
Pages (from-to)	4962-4972
Number of pages	11
Journal	IEEE Transactions on Circuits and Systems for Video Technology
Volume	33
Issue number	9
DOIs	https://doi.org/10.1109/TCSVT.2023.3246091
State	Published - 1 Sep 2023

Keywords

Visible-infrared person re-identification
anaglyph data
bidirectional spatial-temporal aggregation
modality discrepancy

Access to Document

10.1109/TCSVT.2023.3246091

Cite this

@article{fe56e94db4164ae580cf5bc007b6df76,

title = "Intermediary-Guided Bidirectional Spatial-Temporal Aggregation Network for Video-Based Visible-Infrared Person Re-Identification",

abstract = "This work focuses on the task of Video-based Visible-Infrared Person Re-Identification, a promising technique for achieving 24-hour surveillance systems. Two main issues in this field are modality discrepancy mitigating and spatial-temporal information mining. In this work, we propose a novel method, named Intermediary-guided Bidirectional spatial-temporal Aggregation Network (IBAN), to address both issues at once. Specifically, IBAN is designed to learn modality-irrelevant features by leveraging the anaglyph data of pedestrian images to serve as the intermediary. Furthermore, a bidirectional spatial-temporal aggregation module is introduced to exploit the spatial-temporal information of video data, while mitigating the impact of noisy image frames. Finally, we design an Easy-sample-based loss to guide the final embedding space and further improve the model's generalization performance. Extensive experiments on Video-based Visible-Infrared benchmarks show that IBAN achieves promising results and outperforms the state-of-the-art ReID methods by a large margin, improving the rank-1/mAP by 1.29%/3.46% at the Infrared to Visible situation, and by 5.04%/3.27% at the Visible to Infrared situation. The source code of the proposed method will be released at https://github.com/lhf12278/IBAN.",

keywords = "Visible-infrared person re-identification, anaglyph data, bidirectional spatial-temporal aggregation, modality discrepancy",

author = "Huafeng Li and Minghui Liu and Zhanxuan Hu and Feiping Nie and Zhengtao Yu",

note = "Publisher Copyright: {\textcopyright} 1991-2012 IEEE.",

year = "2023",

month = sep,

day = "1",

doi = "10.1109/TCSVT.2023.3246091",

language = "英语",

volume = "33",

pages = "4962--4972",

journal = "IEEE Transactions on Circuits and Systems for Video Technology",

issn = "1051-8215",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "9",

}

TY - JOUR

T1 - Intermediary-Guided Bidirectional Spatial-Temporal Aggregation Network for Video-Based Visible-Infrared Person Re-Identification

AU - Li, Huafeng

AU - Liu, Minghui

AU - Hu, Zhanxuan

AU - Nie, Feiping

AU - Yu, Zhengtao

PY - 2023/9/1

Y1 - 2023/9/1

N2 - This work focuses on the task of Video-based Visible-Infrared Person Re-Identification, a promising technique for achieving 24-hour surveillance systems. Two main issues in this field are modality discrepancy mitigating and spatial-temporal information mining. In this work, we propose a novel method, named Intermediary-guided Bidirectional spatial-temporal Aggregation Network (IBAN), to address both issues at once. Specifically, IBAN is designed to learn modality-irrelevant features by leveraging the anaglyph data of pedestrian images to serve as the intermediary. Furthermore, a bidirectional spatial-temporal aggregation module is introduced to exploit the spatial-temporal information of video data, while mitigating the impact of noisy image frames. Finally, we design an Easy-sample-based loss to guide the final embedding space and further improve the model's generalization performance. Extensive experiments on Video-based Visible-Infrared benchmarks show that IBAN achieves promising results and outperforms the state-of-the-art ReID methods by a large margin, improving the rank-1/mAP by 1.29%/3.46% at the Infrared to Visible situation, and by 5.04%/3.27% at the Visible to Infrared situation. The source code of the proposed method will be released at https://github.com/lhf12278/IBAN.

AB - This work focuses on the task of Video-based Visible-Infrared Person Re-Identification, a promising technique for achieving 24-hour surveillance systems. Two main issues in this field are modality discrepancy mitigating and spatial-temporal information mining. In this work, we propose a novel method, named Intermediary-guided Bidirectional spatial-temporal Aggregation Network (IBAN), to address both issues at once. Specifically, IBAN is designed to learn modality-irrelevant features by leveraging the anaglyph data of pedestrian images to serve as the intermediary. Furthermore, a bidirectional spatial-temporal aggregation module is introduced to exploit the spatial-temporal information of video data, while mitigating the impact of noisy image frames. Finally, we design an Easy-sample-based loss to guide the final embedding space and further improve the model's generalization performance. Extensive experiments on Video-based Visible-Infrared benchmarks show that IBAN achieves promising results and outperforms the state-of-the-art ReID methods by a large margin, improving the rank-1/mAP by 1.29%/3.46% at the Infrared to Visible situation, and by 5.04%/3.27% at the Visible to Infrared situation. The source code of the proposed method will be released at https://github.com/lhf12278/IBAN.

KW - Visible-infrared person re-identification

KW - anaglyph data

KW - bidirectional spatial-temporal aggregation

KW - modality discrepancy

UR - http://www.scopus.com/inward/record.url?scp=85149388901&partnerID=8YFLogxK

U2 - 10.1109/TCSVT.2023.3246091

DO - 10.1109/TCSVT.2023.3246091

M3 - 文章

AN - SCOPUS:85149388901

SN - 1051-8215

VL - 33

SP - 4962

EP - 4972

JO - IEEE Transactions on Circuits and Systems for Video Technology

JF - IEEE Transactions on Circuits and Systems for Video Technology

IS - 9

ER -

Intermediary-Guided Bidirectional Spatial-Temporal Aggregation Network for Video-Based Visible-Infrared Person Re-Identification

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this