TY - JOUR
T1 - Spatial temporal and channel aware network for video-based person re-identification
AU - Fu, Hui
AU - Zhang, Ke
AU - Li, Haoyu
AU - Wang, Jingyu
AU - Wang, Zhen
N1 - Publisher Copyright:
© 2021
PY - 2022/2
Y1 - 2022/2
N2 - As a challenging computer vision task, video-based person Re-IDentification (Re-ID) has been intensively studied, and recent works have achieved a series of satisfactory results by capturing spatial temporal relationships. However, extensive observations have found that the same feature vector generated by a convolutional neural network contains considerable redundant information in the channel dimension. This issue is seldom investigated. A Spatial Temporal and Channel Aware Network (STCAN) for video-based ReID is studied in this paper. It jointly considers spatial temporal and channel information. Firstly, the Spatial Attention Enhanced (SAE) convolutional network is developed as the backbone network to learn spatial enhanced features from video frames. Secondly, a Channel Segmentation and Group Shuffle (CSGS) convolution module is designed to jointly address temporal and channel relations. Finally, a Two Branch Weighted Fusion (TBWF) mechanism is introduced to enhance the robustness of the Re-ID network by fusing the output of the SAE backbone network and CSGS. Comprehensive experiments are conducted on three large-scale datasets MARS, LSVID, and P-DESTRE. The experimental results imply that the STCAN can effectively improve the performance of video-based Re-ID and outperform several state-of-the-art methods.
AB - As a challenging computer vision task, video-based person Re-IDentification (Re-ID) has been intensively studied, and recent works have achieved a series of satisfactory results by capturing spatial temporal relationships. However, extensive observations have found that the same feature vector generated by a convolutional neural network contains considerable redundant information in the channel dimension. This issue is seldom investigated. A Spatial Temporal and Channel Aware Network (STCAN) for video-based ReID is studied in this paper. It jointly considers spatial temporal and channel information. Firstly, the Spatial Attention Enhanced (SAE) convolutional network is developed as the backbone network to learn spatial enhanced features from video frames. Secondly, a Channel Segmentation and Group Shuffle (CSGS) convolution module is designed to jointly address temporal and channel relations. Finally, a Two Branch Weighted Fusion (TBWF) mechanism is introduced to enhance the robustness of the Re-ID network by fusing the output of the SAE backbone network and CSGS. Comprehensive experiments are conducted on three large-scale datasets MARS, LSVID, and P-DESTRE. The experimental results imply that the STCAN can effectively improve the performance of video-based Re-ID and outperform several state-of-the-art methods.
KW - Channel segmentation
KW - Group shuffle convolution
KW - Spatial temporal feature
KW - Video-based Re-ID
UR - http://www.scopus.com/inward/record.url?scp=85121731831&partnerID=8YFLogxK
U2 - 10.1016/j.imavis.2021.104356
DO - 10.1016/j.imavis.2021.104356
M3 - 文章
AN - SCOPUS:85121731831
SN - 0262-8856
VL - 118
JO - Image and Vision Computing
JF - Image and Vision Computing
M1 - 104356
ER -