Spatial temporal and channel aware network for video-based person re-identification

Hui Fu; Ke Zhang; Haoyu Li; Jingyu Wang; Zhen Wang

doi:10.1016/j.imavis.2021.104356

Spatial temporal and channel aware network for video-based person re-identification

Hui Fu, Ke Zhang, Haoyu Li, Jingyu Wang, Zhen Wang

Northwestern Polytechnical University Xian

科研成果: 期刊稿件 › 文章 › 同行评审

11 引用（Scopus）

摘要

As a challenging computer vision task, video-based person Re-IDentification (Re-ID) has been intensively studied, and recent works have achieved a series of satisfactory results by capturing spatial temporal relationships. However, extensive observations have found that the same feature vector generated by a convolutional neural network contains considerable redundant information in the channel dimension. This issue is seldom investigated. A Spatial Temporal and Channel Aware Network (STCAN) for video-based ReID is studied in this paper. It jointly considers spatial temporal and channel information. Firstly, the Spatial Attention Enhanced (SAE) convolutional network is developed as the backbone network to learn spatial enhanced features from video frames. Secondly, a Channel Segmentation and Group Shuffle (CSGS) convolution module is designed to jointly address temporal and channel relations. Finally, a Two Branch Weighted Fusion (TBWF) mechanism is introduced to enhance the robustness of the Re-ID network by fusing the output of the SAE backbone network and CSGS. Comprehensive experiments are conducted on three large-scale datasets MARS, LSVID, and P-DESTRE. The experimental results imply that the STCAN can effectively improve the performance of video-based Re-ID and outperform several state-of-the-art methods.

源语言	英语
文章编号	104356
期刊	Image and Vision Computing
卷	118
DOI	https://doi.org/10.1016/j.imavis.2021.104356
出版状态	已出版 - 2月 2022

访问文件

10.1016/j.imavis.2021.104356

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{5b51655a90ae41f4b6f25edc6e34277d,

title = "Spatial temporal and channel aware network for video-based person re-identification",

abstract = "As a challenging computer vision task, video-based person Re-IDentification (Re-ID) has been intensively studied, and recent works have achieved a series of satisfactory results by capturing spatial temporal relationships. However, extensive observations have found that the same feature vector generated by a convolutional neural network contains considerable redundant information in the channel dimension. This issue is seldom investigated. A Spatial Temporal and Channel Aware Network (STCAN) for video-based ReID is studied in this paper. It jointly considers spatial temporal and channel information. Firstly, the Spatial Attention Enhanced (SAE) convolutional network is developed as the backbone network to learn spatial enhanced features from video frames. Secondly, a Channel Segmentation and Group Shuffle (CSGS) convolution module is designed to jointly address temporal and channel relations. Finally, a Two Branch Weighted Fusion (TBWF) mechanism is introduced to enhance the robustness of the Re-ID network by fusing the output of the SAE backbone network and CSGS. Comprehensive experiments are conducted on three large-scale datasets MARS, LSVID, and P-DESTRE. The experimental results imply that the STCAN can effectively improve the performance of video-based Re-ID and outperform several state-of-the-art methods.",

keywords = "Channel segmentation, Group shuffle convolution, Spatial temporal feature, Video-based Re-ID",

author = "Hui Fu and Ke Zhang and Haoyu Li and Jingyu Wang and Zhen Wang",

note = "Publisher Copyright: {\textcopyright} 2021",

year = "2022",

month = feb,

doi = "10.1016/j.imavis.2021.104356",

language = "英语",

volume = "118",

journal = "Image and Vision Computing",

issn = "0262-8856",

publisher = "Elsevier Ltd",

}

TY - JOUR

T1 - Spatial temporal and channel aware network for video-based person re-identification

AU - Fu, Hui

AU - Zhang, Ke

AU - Li, Haoyu

AU - Wang, Jingyu

AU - Wang, Zhen

PY - 2022/2

Y1 - 2022/2

N2 - As a challenging computer vision task, video-based person Re-IDentification (Re-ID) has been intensively studied, and recent works have achieved a series of satisfactory results by capturing spatial temporal relationships. However, extensive observations have found that the same feature vector generated by a convolutional neural network contains considerable redundant information in the channel dimension. This issue is seldom investigated. A Spatial Temporal and Channel Aware Network (STCAN) for video-based ReID is studied in this paper. It jointly considers spatial temporal and channel information. Firstly, the Spatial Attention Enhanced (SAE) convolutional network is developed as the backbone network to learn spatial enhanced features from video frames. Secondly, a Channel Segmentation and Group Shuffle (CSGS) convolution module is designed to jointly address temporal and channel relations. Finally, a Two Branch Weighted Fusion (TBWF) mechanism is introduced to enhance the robustness of the Re-ID network by fusing the output of the SAE backbone network and CSGS. Comprehensive experiments are conducted on three large-scale datasets MARS, LSVID, and P-DESTRE. The experimental results imply that the STCAN can effectively improve the performance of video-based Re-ID and outperform several state-of-the-art methods.

AB - As a challenging computer vision task, video-based person Re-IDentification (Re-ID) has been intensively studied, and recent works have achieved a series of satisfactory results by capturing spatial temporal relationships. However, extensive observations have found that the same feature vector generated by a convolutional neural network contains considerable redundant information in the channel dimension. This issue is seldom investigated. A Spatial Temporal and Channel Aware Network (STCAN) for video-based ReID is studied in this paper. It jointly considers spatial temporal and channel information. Firstly, the Spatial Attention Enhanced (SAE) convolutional network is developed as the backbone network to learn spatial enhanced features from video frames. Secondly, a Channel Segmentation and Group Shuffle (CSGS) convolution module is designed to jointly address temporal and channel relations. Finally, a Two Branch Weighted Fusion (TBWF) mechanism is introduced to enhance the robustness of the Re-ID network by fusing the output of the SAE backbone network and CSGS. Comprehensive experiments are conducted on three large-scale datasets MARS, LSVID, and P-DESTRE. The experimental results imply that the STCAN can effectively improve the performance of video-based Re-ID and outperform several state-of-the-art methods.

KW - Channel segmentation

KW - Group shuffle convolution

KW - Spatial temporal feature

KW - Video-based Re-ID

UR - http://www.scopus.com/inward/record.url?scp=85121731831&partnerID=8YFLogxK

U2 - 10.1016/j.imavis.2021.104356

DO - 10.1016/j.imavis.2021.104356

M3 - 文章

AN - SCOPUS:85121731831

SN - 0262-8856

VL - 118

JO - Image and Vision Computing

JF - Image and Vision Computing

M1 - 104356

ER -

Spatial temporal and channel aware network for video-based person re-identification

摘要

访问文件

其它文件与链接

指纹

引用此