TY - JOUR
T1 - An Adaptive Correlation Filtering Method for Text-Based Person Search
AU - Sun, Mengyang
AU - Suo, Wei
AU - Wang, Peng
AU - Niu, Kai
AU - Liu, Le
AU - Lin, Guosheng
AU - Zhang, Yanning
AU - Wu, Qi
N1 - Publisher Copyright:
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.
PY - 2024/10
Y1 - 2024/10
N2 - Text-based person search aims to align person images with natural language descriptions, which can be widely used in video surveillance field, such as missing person searching and suspect tracking. In this task, extracting distinct representations and aligning them among identities based on descriptions is a crucial yet challenging problem. Most previous methods rely on additional language parsers or vision techniques to identify and select the relevant regions and words from inputs. However, these methods suffer from heavy computation costs and error accumulation. Meanwhile, simply using horizontal segmentation images to obtain local-level features would harm the reliability of models. To address these problems, we first present a novel Simple and Robust Correlation Filtering (SRCF) method which is capable of effectively extracting key clues and aligning discriminative features. Different from previous works, we design two different types of filtering modules (including denoising filters and dictionary filters) to extract essential features and establish multi-modal mappings. Furthermore, despite the SRCF being pretty well, it is still struggling with semantic ambiguity and uni-modal updating. Therefore, we further propose Multi-modal Adaptive Correlation Filtering (MACF) method that adaptively learns the vital regions and keywords with a shared update strategy. Meanwhile, we introduce a new mutually conditional gate to dynamically control the updating process of filters. Extensive experiments demonstrate that both proposed methods improve the robustness and reliability of the model and achieve better performance on the two text-based person search datasets.
AB - Text-based person search aims to align person images with natural language descriptions, which can be widely used in video surveillance field, such as missing person searching and suspect tracking. In this task, extracting distinct representations and aligning them among identities based on descriptions is a crucial yet challenging problem. Most previous methods rely on additional language parsers or vision techniques to identify and select the relevant regions and words from inputs. However, these methods suffer from heavy computation costs and error accumulation. Meanwhile, simply using horizontal segmentation images to obtain local-level features would harm the reliability of models. To address these problems, we first present a novel Simple and Robust Correlation Filtering (SRCF) method which is capable of effectively extracting key clues and aligning discriminative features. Different from previous works, we design two different types of filtering modules (including denoising filters and dictionary filters) to extract essential features and establish multi-modal mappings. Furthermore, despite the SRCF being pretty well, it is still struggling with semantic ambiguity and uni-modal updating. Therefore, we further propose Multi-modal Adaptive Correlation Filtering (MACF) method that adaptively learns the vital regions and keywords with a shared update strategy. Meanwhile, we introduce a new mutually conditional gate to dynamically control the updating process of filters. Extensive experiments demonstrate that both proposed methods improve the robustness and reliability of the model and achieve better performance on the two text-based person search datasets.
KW - Correlation filtering
KW - Neural network
KW - Text-based person search
KW - Vision and language
UR - http://www.scopus.com/inward/record.url?scp=85193235783&partnerID=8YFLogxK
U2 - 10.1007/s11263-024-02094-8
DO - 10.1007/s11263-024-02094-8
M3 - 文章
AN - SCOPUS:85193235783
SN - 0920-5691
VL - 132
SP - 4440
EP - 4455
JO - International Journal of Computer Vision
JF - International Journal of Computer Vision
IS - 10
ER -