TY - JOUR
T1 - NCSiam
T2 - Reliable Matching via Neighborhood Consensus for Siamese-Based Object Tracking
AU - Lai, Pujian
AU - Cheng, Gong
AU - Zhang, Meili
AU - Ning, Jifeng
AU - Zheng, Xiangtao
AU - Han, Junwei
N1 - Publisher Copyright:
© 1992-2012 IEEE.
PY - 2023
Y1 - 2023
N2 - An essential need for accurate visual object tracking is to capture better correlations between the tracking target and the search region. However, the dominant Siamese-based trackers are limited to producing dense similarity maps at once via a cross-correlations operation, ignoring to remedy the contamination caused by erroneous or ambiguous matches. In this paper, we propose a novel tracker, termed neighborhood consensus constraint-based siamese tracker (NCSiam), which takes the idea of neighborhood consensus constraint to refine the produced correlation maps. The intuition behind our approach is that we can support the nearby erroneous or ambiguous matches by analyzing a larger context of the scene that contains a unique match. Specifically, we devise a 4D convolution-based multi-level similarity refinement (MLSR) strategy. Taking the primary similarity maps obtained from a cross-correlation as input, MLSR acquires reliable matches by analyzing neighborhood consensus patterns in 4D space, thus enhancing the discriminability between the tracking target and the distractors. Besides, traditional Siamese-based trackers directly perform classification and regression on similarity response maps which discard appearance or semantic information. Therefore, an appearance affinity decoder (AAD) is developed to take full advantage of the semantic information of the search region. To further improve performance, we design a task-specific disentanglement (TSD) module to decouple the learned representations into classification-specific and regression-specific embeddings. Extensive experiments are conducted on six challenging benchmarks, including GOT-10k, TrackingNet, LaSOT, UAV123, OTB2015, and VOT2020. The results demonstrate the effectiveness of our method. The code will be available at https://github.com/laybebe/NCSiam.
AB - An essential need for accurate visual object tracking is to capture better correlations between the tracking target and the search region. However, the dominant Siamese-based trackers are limited to producing dense similarity maps at once via a cross-correlations operation, ignoring to remedy the contamination caused by erroneous or ambiguous matches. In this paper, we propose a novel tracker, termed neighborhood consensus constraint-based siamese tracker (NCSiam), which takes the idea of neighborhood consensus constraint to refine the produced correlation maps. The intuition behind our approach is that we can support the nearby erroneous or ambiguous matches by analyzing a larger context of the scene that contains a unique match. Specifically, we devise a 4D convolution-based multi-level similarity refinement (MLSR) strategy. Taking the primary similarity maps obtained from a cross-correlation as input, MLSR acquires reliable matches by analyzing neighborhood consensus patterns in 4D space, thus enhancing the discriminability between the tracking target and the distractors. Besides, traditional Siamese-based trackers directly perform classification and regression on similarity response maps which discard appearance or semantic information. Therefore, an appearance affinity decoder (AAD) is developed to take full advantage of the semantic information of the search region. To further improve performance, we design a task-specific disentanglement (TSD) module to decouple the learned representations into classification-specific and regression-specific embeddings. Extensive experiments are conducted on six challenging benchmarks, including GOT-10k, TrackingNet, LaSOT, UAV123, OTB2015, and VOT2020. The results demonstrate the effectiveness of our method. The code will be available at https://github.com/laybebe/NCSiam.
KW - 4D convolution
KW - Appearance affinity decoder
KW - siamese-based trackers
KW - task-specific disentanglement
UR - http://www.scopus.com/inward/record.url?scp=85176893198&partnerID=8YFLogxK
U2 - 10.1109/TIP.2023.3329669
DO - 10.1109/TIP.2023.3329669
M3 - 文章
C2 - 37938957
AN - SCOPUS:85176893198
SN - 1057-7149
VL - 32
SP - 6168
EP - 6182
JO - IEEE Transactions on Image Processing
JF - IEEE Transactions on Image Processing
ER -