Deep learning-based pose prediction for visual servoing of robotic manipulators using image similarity

Yaozhen He; Jian Gao; Yimin Chen

doi:10.1016/j.neucom.2022.03.045

Deep learning-based pose prediction for visual servoing of robotic manipulators using image similarity

Yaozhen He, Jian Gao, Yimin Chen

School of Marine Science and Technology

Northwestern Polytechnical University Xian

Research output: Contribution to journal › Article › peer-review

9 Scopus citations

Abstract

The accuracy of pose prediction is crucial in learning-based visual servoing. Motivated by the fact that the more similar observed images are, the closer the camera poses, we propose a joint training strategy with a two-part loss function in this paper. One part is the least absolute deviation (L1) loss function, which is defined by the error between the predicted pose and the pose label. The other is the mean similarity image measurement loss function (MSIM), which is related to the image's brightness, contrast, and structure similarity and is determined by the differences between the input image and the image corresponding to the predicted pose. Meanwhile, a data generator based on spherical projection is created to generate data uniformly for training a CNN model, and position-based visual servoing (PBVS) is designed for a robotic manipulator after pose prediction. A numeric simulation and real experiments are conducted in a virtual environment and with a UR3 manipulator. The results show that the proposed method can realize more accurate pose prediction and is robust to occlusion disturbance, and PBVS is achieved by using monocular images.

Original language	English
Pages (from-to)	343-352
Number of pages	10
Journal	Neurocomputing
Volume	491
DOIs	https://doi.org/10.1016/j.neucom.2022.03.045
State	Published - 28 Jun 2022

Keywords

PBVS
Pose prediction
Similarity measurement
Visual servoing

Access to Document

10.1016/j.neucom.2022.03.045

Cite this

@article{b4b15e10e87e4bed9e76a18afb7e811a,

title = "Deep learning-based pose prediction for visual servoing of robotic manipulators using image similarity",

abstract = "The accuracy of pose prediction is crucial in learning-based visual servoing. Motivated by the fact that the more similar observed images are, the closer the camera poses, we propose a joint training strategy with a two-part loss function in this paper. One part is the least absolute deviation (L1) loss function, which is defined by the error between the predicted pose and the pose label. The other is the mean similarity image measurement loss function (MSIM), which is related to the image's brightness, contrast, and structure similarity and is determined by the differences between the input image and the image corresponding to the predicted pose. Meanwhile, a data generator based on spherical projection is created to generate data uniformly for training a CNN model, and position-based visual servoing (PBVS) is designed for a robotic manipulator after pose prediction. A numeric simulation and real experiments are conducted in a virtual environment and with a UR3 manipulator. The results show that the proposed method can realize more accurate pose prediction and is robust to occlusion disturbance, and PBVS is achieved by using monocular images.",

keywords = "PBVS, Pose prediction, Similarity measurement, Visual servoing",

author = "Yaozhen He and Jian Gao and Yimin Chen",

note = "Publisher Copyright: {\textcopyright} 2022 Elsevier B.V.",

year = "2022",

month = jun,

day = "28",

doi = "10.1016/j.neucom.2022.03.045",

language = "英语",

volume = "491",

pages = "343--352",

journal = "Neurocomputing",

issn = "0925-2312",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Deep learning-based pose prediction for visual servoing of robotic manipulators using image similarity

AU - He, Yaozhen

AU - Gao, Jian

AU - Chen, Yimin

PY - 2022/6/28

Y1 - 2022/6/28

N2 - The accuracy of pose prediction is crucial in learning-based visual servoing. Motivated by the fact that the more similar observed images are, the closer the camera poses, we propose a joint training strategy with a two-part loss function in this paper. One part is the least absolute deviation (L1) loss function, which is defined by the error between the predicted pose and the pose label. The other is the mean similarity image measurement loss function (MSIM), which is related to the image's brightness, contrast, and structure similarity and is determined by the differences between the input image and the image corresponding to the predicted pose. Meanwhile, a data generator based on spherical projection is created to generate data uniformly for training a CNN model, and position-based visual servoing (PBVS) is designed for a robotic manipulator after pose prediction. A numeric simulation and real experiments are conducted in a virtual environment and with a UR3 manipulator. The results show that the proposed method can realize more accurate pose prediction and is robust to occlusion disturbance, and PBVS is achieved by using monocular images.

AB - The accuracy of pose prediction is crucial in learning-based visual servoing. Motivated by the fact that the more similar observed images are, the closer the camera poses, we propose a joint training strategy with a two-part loss function in this paper. One part is the least absolute deviation (L1) loss function, which is defined by the error between the predicted pose and the pose label. The other is the mean similarity image measurement loss function (MSIM), which is related to the image's brightness, contrast, and structure similarity and is determined by the differences between the input image and the image corresponding to the predicted pose. Meanwhile, a data generator based on spherical projection is created to generate data uniformly for training a CNN model, and position-based visual servoing (PBVS) is designed for a robotic manipulator after pose prediction. A numeric simulation and real experiments are conducted in a virtual environment and with a UR3 manipulator. The results show that the proposed method can realize more accurate pose prediction and is robust to occlusion disturbance, and PBVS is achieved by using monocular images.

KW - PBVS

KW - Pose prediction

KW - Similarity measurement

KW - Visual servoing

UR - http://www.scopus.com/inward/record.url?scp=85127333824&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2022.03.045

DO - 10.1016/j.neucom.2022.03.045

M3 - 文章

AN - SCOPUS:85127333824

SN - 0925-2312

VL - 491

SP - 343

EP - 352

JO - Neurocomputing

JF - Neurocomputing

ER -

Deep learning-based pose prediction for visual servoing of robotic manipulators using image similarity

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this