Intermediary-Guided Bidirectional Spatial-Temporal Aggregation Network for Video-Based Visible-Infrared Person Re-Identification

Huafeng Li, Minghui Liu, Zhanxuan Hu, Feiping Nie, Zhengtao Yu

Research output: Contribution to journalArticlepeer-review

22 Scopus citations

Abstract

This work focuses on the task of Video-based Visible-Infrared Person Re-Identification, a promising technique for achieving 24-hour surveillance systems. Two main issues in this field are modality discrepancy mitigating and spatial-temporal information mining. In this work, we propose a novel method, named Intermediary-guided Bidirectional spatial-temporal Aggregation Network (IBAN), to address both issues at once. Specifically, IBAN is designed to learn modality-irrelevant features by leveraging the anaglyph data of pedestrian images to serve as the intermediary. Furthermore, a bidirectional spatial-temporal aggregation module is introduced to exploit the spatial-temporal information of video data, while mitigating the impact of noisy image frames. Finally, we design an Easy-sample-based loss to guide the final embedding space and further improve the model's generalization performance. Extensive experiments on Video-based Visible-Infrared benchmarks show that IBAN achieves promising results and outperforms the state-of-the-art ReID methods by a large margin, improving the rank-1/mAP by 1.29%/3.46% at the Infrared to Visible situation, and by 5.04%/3.27% at the Visible to Infrared situation. The source code of the proposed method will be released at https://github.com/lhf12278/IBAN.

Original languageEnglish
Pages (from-to)4962-4972
Number of pages11
JournalIEEE Transactions on Circuits and Systems for Video Technology
Volume33
Issue number9
DOIs
StatePublished - 1 Sep 2023

Keywords

  • Visible-infrared person re-identification
  • anaglyph data
  • bidirectional spatial-temporal aggregation
  • modality discrepancy

Fingerprint

Dive into the research topics of 'Intermediary-Guided Bidirectional Spatial-Temporal Aggregation Network for Video-Based Visible-Infrared Person Re-Identification'. Together they form a unique fingerprint.

Cite this