TY - JOUR
T1 - Intermediary-Guided Bidirectional Spatial-Temporal Aggregation Network for Video-Based Visible-Infrared Person Re-Identification
AU - Li, Huafeng
AU - Liu, Minghui
AU - Hu, Zhanxuan
AU - Nie, Feiping
AU - Yu, Zhengtao
N1 - Publisher Copyright:
© 1991-2012 IEEE.
PY - 2023/9/1
Y1 - 2023/9/1
N2 - This work focuses on the task of Video-based Visible-Infrared Person Re-Identification, a promising technique for achieving 24-hour surveillance systems. Two main issues in this field are modality discrepancy mitigating and spatial-temporal information mining. In this work, we propose a novel method, named Intermediary-guided Bidirectional spatial-temporal Aggregation Network (IBAN), to address both issues at once. Specifically, IBAN is designed to learn modality-irrelevant features by leveraging the anaglyph data of pedestrian images to serve as the intermediary. Furthermore, a bidirectional spatial-temporal aggregation module is introduced to exploit the spatial-temporal information of video data, while mitigating the impact of noisy image frames. Finally, we design an Easy-sample-based loss to guide the final embedding space and further improve the model's generalization performance. Extensive experiments on Video-based Visible-Infrared benchmarks show that IBAN achieves promising results and outperforms the state-of-the-art ReID methods by a large margin, improving the rank-1/mAP by 1.29%/3.46% at the Infrared to Visible situation, and by 5.04%/3.27% at the Visible to Infrared situation. The source code of the proposed method will be released at https://github.com/lhf12278/IBAN.
AB - This work focuses on the task of Video-based Visible-Infrared Person Re-Identification, a promising technique for achieving 24-hour surveillance systems. Two main issues in this field are modality discrepancy mitigating and spatial-temporal information mining. In this work, we propose a novel method, named Intermediary-guided Bidirectional spatial-temporal Aggregation Network (IBAN), to address both issues at once. Specifically, IBAN is designed to learn modality-irrelevant features by leveraging the anaglyph data of pedestrian images to serve as the intermediary. Furthermore, a bidirectional spatial-temporal aggregation module is introduced to exploit the spatial-temporal information of video data, while mitigating the impact of noisy image frames. Finally, we design an Easy-sample-based loss to guide the final embedding space and further improve the model's generalization performance. Extensive experiments on Video-based Visible-Infrared benchmarks show that IBAN achieves promising results and outperforms the state-of-the-art ReID methods by a large margin, improving the rank-1/mAP by 1.29%/3.46% at the Infrared to Visible situation, and by 5.04%/3.27% at the Visible to Infrared situation. The source code of the proposed method will be released at https://github.com/lhf12278/IBAN.
KW - Visible-infrared person re-identification
KW - anaglyph data
KW - bidirectional spatial-temporal aggregation
KW - modality discrepancy
UR - http://www.scopus.com/inward/record.url?scp=85149388901&partnerID=8YFLogxK
U2 - 10.1109/TCSVT.2023.3246091
DO - 10.1109/TCSVT.2023.3246091
M3 - 文章
AN - SCOPUS:85149388901
SN - 1051-8215
VL - 33
SP - 4962
EP - 4972
JO - IEEE Transactions on Circuits and Systems for Video Technology
JF - IEEE Transactions on Circuits and Systems for Video Technology
IS - 9
ER -