Learning End-to-End Visual Servoing Using an Improved Soft Actor-Critic Approach With Centralized Novelty Measurement

Jian Gao; Yaozhen He; Yimin Chen; Yufeng Li

doi:10.1109/TIM.2023.3273687

Learning End-to-End Visual Servoing Using an Improved Soft Actor-Critic Approach With Centralized Novelty Measurement

Jian Gao, Yaozhen He, Yimin Chen, Yufeng Li

School of Marine Science and Technology

Northwestern Polytechnical University Xian

Research output: Contribution to journal › Article › peer-review

4 Scopus citations

Abstract

End-to-end visual servoing (VS) based on reinforcement learning (RL) can simplify the design of features and control laws and has strong scalability in combination with neural networks. However, it is challenging for RL-based VS tasks to operate in a continuous state or action space due to the difficulty in space exploration and slow training convergence. Hence, this article presents a novel measurement method based on centralized features extracted by a neural network, which calculates the novelty of the visited state to encourage RL-agent exploration. Moreover, we propose a hybrid probability sampling method that improves the prioritized experience replay (PER) based on a temporal-difference (TD) error by integrating the intrinsic and external rewards. This strategy represents the transition novelty and quality in the buffer replay, respectively, to promote convergence in the training process. Finally, we develop an end-to-end VS scheme based on the maximum entropy RL soft actor-critic (SAC). Several simulated experiments in CoppeliaSim are designed for end-to-end VS, where the target detection information is the agent's input. The results highlight that our method's reward value and completion rates are 0.35% and 8.0% higher than the SAC VS baseline. At the same time, we conduct experiments to verify the effectiveness of the proposed algorithm.

Original language	English
Article number	2514512
Journal	IEEE Transactions on Instrumentation and Measurement
Volume	72
DOIs	https://doi.org/10.1109/TIM.2023.3273687
State	Published - 2023

Keywords

Intrinsic reward
novelty measure
reinforcement learning (RL)
sampling optimization
visual servoing (VS)

Access to Document

10.1109/TIM.2023.3273687

Cite this

@article{e502c1e7178247c6a8e567d182aa3809,

title = "Learning End-to-End Visual Servoing Using an Improved Soft Actor-Critic Approach With Centralized Novelty Measurement",

abstract = "End-to-end visual servoing (VS) based on reinforcement learning (RL) can simplify the design of features and control laws and has strong scalability in combination with neural networks. However, it is challenging for RL-based VS tasks to operate in a continuous state or action space due to the difficulty in space exploration and slow training convergence. Hence, this article presents a novel measurement method based on centralized features extracted by a neural network, which calculates the novelty of the visited state to encourage RL-agent exploration. Moreover, we propose a hybrid probability sampling method that improves the prioritized experience replay (PER) based on a temporal-difference (TD) error by integrating the intrinsic and external rewards. This strategy represents the transition novelty and quality in the buffer replay, respectively, to promote convergence in the training process. Finally, we develop an end-to-end VS scheme based on the maximum entropy RL soft actor-critic (SAC). Several simulated experiments in CoppeliaSim are designed for end-to-end VS, where the target detection information is the agent's input. The results highlight that our method's reward value and completion rates are 0.35% and 8.0% higher than the SAC VS baseline. At the same time, we conduct experiments to verify the effectiveness of the proposed algorithm.",

keywords = "Intrinsic reward, novelty measure, reinforcement learning (RL), sampling optimization, visual servoing (VS)",

author = "Jian Gao and Yaozhen He and Yimin Chen and Yufeng Li",

note = "Publisher Copyright: {\textcopyright} 1963-2012 IEEE.",

year = "2023",

doi = "10.1109/TIM.2023.3273687",

language = "英语",

volume = "72",

journal = "IEEE Transactions on Instrumentation and Measurement",

issn = "0018-9456",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Learning End-to-End Visual Servoing Using an Improved Soft Actor-Critic Approach With Centralized Novelty Measurement

AU - Gao, Jian

AU - He, Yaozhen

AU - Chen, Yimin

AU - Li, Yufeng

PY - 2023

Y1 - 2023

N2 - End-to-end visual servoing (VS) based on reinforcement learning (RL) can simplify the design of features and control laws and has strong scalability in combination with neural networks. However, it is challenging for RL-based VS tasks to operate in a continuous state or action space due to the difficulty in space exploration and slow training convergence. Hence, this article presents a novel measurement method based on centralized features extracted by a neural network, which calculates the novelty of the visited state to encourage RL-agent exploration. Moreover, we propose a hybrid probability sampling method that improves the prioritized experience replay (PER) based on a temporal-difference (TD) error by integrating the intrinsic and external rewards. This strategy represents the transition novelty and quality in the buffer replay, respectively, to promote convergence in the training process. Finally, we develop an end-to-end VS scheme based on the maximum entropy RL soft actor-critic (SAC). Several simulated experiments in CoppeliaSim are designed for end-to-end VS, where the target detection information is the agent's input. The results highlight that our method's reward value and completion rates are 0.35% and 8.0% higher than the SAC VS baseline. At the same time, we conduct experiments to verify the effectiveness of the proposed algorithm.

AB - End-to-end visual servoing (VS) based on reinforcement learning (RL) can simplify the design of features and control laws and has strong scalability in combination with neural networks. However, it is challenging for RL-based VS tasks to operate in a continuous state or action space due to the difficulty in space exploration and slow training convergence. Hence, this article presents a novel measurement method based on centralized features extracted by a neural network, which calculates the novelty of the visited state to encourage RL-agent exploration. Moreover, we propose a hybrid probability sampling method that improves the prioritized experience replay (PER) based on a temporal-difference (TD) error by integrating the intrinsic and external rewards. This strategy represents the transition novelty and quality in the buffer replay, respectively, to promote convergence in the training process. Finally, we develop an end-to-end VS scheme based on the maximum entropy RL soft actor-critic (SAC). Several simulated experiments in CoppeliaSim are designed for end-to-end VS, where the target detection information is the agent's input. The results highlight that our method's reward value and completion rates are 0.35% and 8.0% higher than the SAC VS baseline. At the same time, we conduct experiments to verify the effectiveness of the proposed algorithm.

KW - Intrinsic reward

KW - novelty measure

KW - reinforcement learning (RL)

KW - sampling optimization

KW - visual servoing (VS)

UR - http://www.scopus.com/inward/record.url?scp=85159821499&partnerID=8YFLogxK

U2 - 10.1109/TIM.2023.3273687

DO - 10.1109/TIM.2023.3273687

M3 - 文章

AN - SCOPUS:85159821499

SN - 0018-9456

VL - 72

JO - IEEE Transactions on Instrumentation and Measurement

JF - IEEE Transactions on Instrumentation and Measurement

M1 - 2514512

ER -

Learning End-to-End Visual Servoing Using an Improved Soft Actor-Critic Approach With Centralized Novelty Measurement

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this