TY - JOUR
T1 - Learning End-to-End Visual Servoing Using an Improved Soft Actor-Critic Approach With Centralized Novelty Measurement
AU - Gao, Jian
AU - He, Yaozhen
AU - Chen, Yimin
AU - Li, Yufeng
N1 - Publisher Copyright:
© 1963-2012 IEEE.
PY - 2023
Y1 - 2023
N2 - End-to-end visual servoing (VS) based on reinforcement learning (RL) can simplify the design of features and control laws and has strong scalability in combination with neural networks. However, it is challenging for RL-based VS tasks to operate in a continuous state or action space due to the difficulty in space exploration and slow training convergence. Hence, this article presents a novel measurement method based on centralized features extracted by a neural network, which calculates the novelty of the visited state to encourage RL-agent exploration. Moreover, we propose a hybrid probability sampling method that improves the prioritized experience replay (PER) based on a temporal-difference (TD) error by integrating the intrinsic and external rewards. This strategy represents the transition novelty and quality in the buffer replay, respectively, to promote convergence in the training process. Finally, we develop an end-to-end VS scheme based on the maximum entropy RL soft actor-critic (SAC). Several simulated experiments in CoppeliaSim are designed for end-to-end VS, where the target detection information is the agent's input. The results highlight that our method's reward value and completion rates are 0.35% and 8.0% higher than the SAC VS baseline. At the same time, we conduct experiments to verify the effectiveness of the proposed algorithm.
AB - End-to-end visual servoing (VS) based on reinforcement learning (RL) can simplify the design of features and control laws and has strong scalability in combination with neural networks. However, it is challenging for RL-based VS tasks to operate in a continuous state or action space due to the difficulty in space exploration and slow training convergence. Hence, this article presents a novel measurement method based on centralized features extracted by a neural network, which calculates the novelty of the visited state to encourage RL-agent exploration. Moreover, we propose a hybrid probability sampling method that improves the prioritized experience replay (PER) based on a temporal-difference (TD) error by integrating the intrinsic and external rewards. This strategy represents the transition novelty and quality in the buffer replay, respectively, to promote convergence in the training process. Finally, we develop an end-to-end VS scheme based on the maximum entropy RL soft actor-critic (SAC). Several simulated experiments in CoppeliaSim are designed for end-to-end VS, where the target detection information is the agent's input. The results highlight that our method's reward value and completion rates are 0.35% and 8.0% higher than the SAC VS baseline. At the same time, we conduct experiments to verify the effectiveness of the proposed algorithm.
KW - Intrinsic reward
KW - novelty measure
KW - reinforcement learning (RL)
KW - sampling optimization
KW - visual servoing (VS)
UR - http://www.scopus.com/inward/record.url?scp=85159821499&partnerID=8YFLogxK
U2 - 10.1109/TIM.2023.3273687
DO - 10.1109/TIM.2023.3273687
M3 - 文章
AN - SCOPUS:85159821499
SN - 0018-9456
VL - 72
JO - IEEE Transactions on Instrumentation and Measurement
JF - IEEE Transactions on Instrumentation and Measurement
M1 - 2514512
ER -