Skip to main navigation Skip to search Skip to main content

DST-Net: A closed-loop dual-stream transformer with identity-guided video matting for visible–infrared person re-identification

  • Northwestern Polytechnical University Xian

Research output: Contribution to journalArticlepeer-review

Abstract

Visible–infrared person re-identification in real-world surveillance video remains challenging due to spectrum-induced appearance gaps, cluttered backgrounds, and temporal perturbations. A dual-stream Transformer framework, DST-Net, is introduced to learn modality-specific and modality-shared representations for effective cross-modality alignment. Bidirectional cross-attention is employed to exchange complementary cues between visible and infrared streams, multi-factor graph optimization is used to enforce topology-consistent features, and a multi-mask triplet strategy is adopted to emphasize foreground-relevant supervision. Temporal Identity-Structured Matting is further incorporated to generate temporally consistent foreground alpha mattes, enabling a closed-loop detection–matting–recognition pipeline for robust retrieval. A large-scale surveillance-style benchmark, NPU-ReID, is also released, collected by an eight-camera synchronized RGB and infrared array. On SYSU-MM01, 84.16% Rank-1 and 79.63% mAP are achieved; on RegDB, 92.07% Rank-1 and 86.02% mAP are obtained under the visible-to-infrared setting; and on NPU-ReID, 94.41% Rank-1 and 84.92% mAP are reached. In real-world multi-camera tests, an average throughput of 32.95 fps is reported, together with 97% detection accuracy and 97% Rank-5 retrieval accuracy. The dataset and associated resources are available at https://github.com/YzZhu07/NPU-ReID.

Original languageEnglish
Article number133545
JournalNeurocomputing
Volume684
DOIs
StatePublished - 1 Jul 2026

Keywords

  • Dual-stream transformer
  • Graph optimization
  • Spatio-temporal matting
  • Visible–infrared person re-identification

Fingerprint

Dive into the research topics of 'DST-Net: A closed-loop dual-stream transformer with identity-guided video matting for visible–infrared person re-identification'. Together they form a unique fingerprint.

Cite this