Learning geometry information of target for visual object tracking with siamese networks

Hang Chen; Weiguo Zhang; Danghui Yan

doi:10.3390/s21237790

Learning geometry information of target for visual object tracking with siamese networks

Hang Chen, Weiguo Zhang, Danghui Yan

自动化学院

Northwestern Polytechnical University Xian

科研成果: 期刊稿件 › 文章 › 同行评审

2 引用（Scopus）

摘要

Recently, Siamese architecture has been widely used in the field of visual tracking, and has achieved great success. Most Siamese network based trackers aggregate the target information of two branches by cross-correlation. However, since the location of the sampling points in the search feature area is pre-fixed in cross-correlation operation, these trackers suffer from either background noise influence or missing foreground information. Moreover, the cross-correlation between the template and the search area neglects the geometry information of the target. In this paper, we propose a Siamese deformable cross-correlation network to model the geometric structure of target and improve the performance of visual tracking. We propose to learn an offset field end-to-end in cross-correlation. With the guidance of the offset field, the sampling in the search image area can adapt to the deformation of the target, and realize the modeling of the geometric structure of the target. We further propose an online classification sub-network to model the variation of target appearance and enhance the robustness of the tracker. Extensive experiments are conducted on four challenging benchmarks, including OTB2015, VOT2018, VOT2019 and UAV123. The results demonstrate that our tracker achieves state-of-the-art performance.

源语言	英语
文章编号	7790
期刊	Sensors
卷	21
期	23
DOI	https://doi.org/10.3390/s21237790
出版状态	已出版 - 12月 2021

访问文件

10.3390/s21237790

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{9034d9ea2133450fa850fbdd1e8c67f8,

title = "Learning geometry information of target for visual object tracking with siamese networks",

abstract = "Recently, Siamese architecture has been widely used in the field of visual tracking, and has achieved great success. Most Siamese network based trackers aggregate the target information of two branches by cross-correlation. However, since the location of the sampling points in the search feature area is pre-fixed in cross-correlation operation, these trackers suffer from either background noise influence or missing foreground information. Moreover, the cross-correlation between the template and the search area neglects the geometry information of the target. In this paper, we propose a Siamese deformable cross-correlation network to model the geometric structure of target and improve the performance of visual tracking. We propose to learn an offset field end-to-end in cross-correlation. With the guidance of the offset field, the sampling in the search image area can adapt to the deformation of the target, and realize the modeling of the geometric structure of the target. We further propose an online classification sub-network to model the variation of target appearance and enhance the robustness of the tracker. Extensive experiments are conducted on four challenging benchmarks, including OTB2015, VOT2018, VOT2019 and UAV123. The results demonstrate that our tracker achieves state-of-the-art performance.",

keywords = "Deformable convolution, Deformable cross-correlation, Siamese network, Visual object tracking",

author = "Hang Chen and Weiguo Zhang and Danghui Yan",

note = "Publisher Copyright: {\textcopyright} 2021 by the authors. Licensee MDPI, Basel, Switzerland.",

year = "2021",

month = dec,

doi = "10.3390/s21237790",

language = "英语",

volume = "21",

journal = "Sensors",

issn = "1424-8220",

publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",

number = "23",

}

TY - JOUR

T1 - Learning geometry information of target for visual object tracking with siamese networks

AU - Chen, Hang

AU - Zhang, Weiguo

AU - Yan, Danghui

PY - 2021/12

Y1 - 2021/12

N2 - Recently, Siamese architecture has been widely used in the field of visual tracking, and has achieved great success. Most Siamese network based trackers aggregate the target information of two branches by cross-correlation. However, since the location of the sampling points in the search feature area is pre-fixed in cross-correlation operation, these trackers suffer from either background noise influence or missing foreground information. Moreover, the cross-correlation between the template and the search area neglects the geometry information of the target. In this paper, we propose a Siamese deformable cross-correlation network to model the geometric structure of target and improve the performance of visual tracking. We propose to learn an offset field end-to-end in cross-correlation. With the guidance of the offset field, the sampling in the search image area can adapt to the deformation of the target, and realize the modeling of the geometric structure of the target. We further propose an online classification sub-network to model the variation of target appearance and enhance the robustness of the tracker. Extensive experiments are conducted on four challenging benchmarks, including OTB2015, VOT2018, VOT2019 and UAV123. The results demonstrate that our tracker achieves state-of-the-art performance.

AB - Recently, Siamese architecture has been widely used in the field of visual tracking, and has achieved great success. Most Siamese network based trackers aggregate the target information of two branches by cross-correlation. However, since the location of the sampling points in the search feature area is pre-fixed in cross-correlation operation, these trackers suffer from either background noise influence or missing foreground information. Moreover, the cross-correlation between the template and the search area neglects the geometry information of the target. In this paper, we propose a Siamese deformable cross-correlation network to model the geometric structure of target and improve the performance of visual tracking. We propose to learn an offset field end-to-end in cross-correlation. With the guidance of the offset field, the sampling in the search image area can adapt to the deformation of the target, and realize the modeling of the geometric structure of the target. We further propose an online classification sub-network to model the variation of target appearance and enhance the robustness of the tracker. Extensive experiments are conducted on four challenging benchmarks, including OTB2015, VOT2018, VOT2019 and UAV123. The results demonstrate that our tracker achieves state-of-the-art performance.

KW - Deformable convolution

KW - Deformable cross-correlation

KW - Siamese network

KW - Visual object tracking

UR - http://www.scopus.com/inward/record.url?scp=85119583892&partnerID=8YFLogxK

U2 - 10.3390/s21237790

DO - 10.3390/s21237790

M3 - 文章

C2 - 34883790

AN - SCOPUS:85119583892

SN - 1424-8220

VL - 21

JO - Sensors

JF - Sensors

IS - 23

M1 - 7790

ER -

Learning geometry information of target for visual object tracking with siamese networks

摘要

访问文件

其它文件与链接

指纹

引用此