TY - JOUR
T1 - 基于改进深度双Q网络的移动机器人路径规划算法
AU - Zhang, Lei
AU - Mu, Yashuang
AU - Pan, Quan
N1 - Publisher Copyright:
© 2024 Science Press. All rights reserved.
PY - 2024
Y1 - 2024
N2 - To solve the problems of the conventional mobile robot path planning method based on the deep double Q-network (DDQN), such as incomplete search and slow convergence, we propose an improved DDQN (I-DDQN) learning algorithm. First, the proposed I-DDQN algorithm uses the competitive network structure to estimate the value function of the DDQN algorithm. Second, we propose a robot path exploration strategy based on a two-layer controller structure, where the value function of the upper controller is used to explore the local optimal action of the mobile robot and the value function of the lower controller is used to learn the global task strategy. In addition, during algorithm learning, we use the priority experience playback mechanism for data collection and sampling and the small-batch data for network training. Finally, we perform a comparative analysis with the conventional DDQN algorithm and its improved algorithm in two different simulation environments, OpenAI Gym and Gazebo. The experimental results show that the proposed I-DDQN algorithm is superior to the conventional DDQN algorithm and its improved algorithm in terms of various evaluation indicators in the two simulation environments and effectively overcomes the problems of incomplete path search and slow convergence speed in the same complex environment.
AB - To solve the problems of the conventional mobile robot path planning method based on the deep double Q-network (DDQN), such as incomplete search and slow convergence, we propose an improved DDQN (I-DDQN) learning algorithm. First, the proposed I-DDQN algorithm uses the competitive network structure to estimate the value function of the DDQN algorithm. Second, we propose a robot path exploration strategy based on a two-layer controller structure, where the value function of the upper controller is used to explore the local optimal action of the mobile robot and the value function of the lower controller is used to learn the global task strategy. In addition, during algorithm learning, we use the priority experience playback mechanism for data collection and sampling and the small-batch data for network training. Finally, we perform a comparative analysis with the conventional DDQN algorithm and its improved algorithm in two different simulation environments, OpenAI Gym and Gazebo. The experimental results show that the proposed I-DDQN algorithm is superior to the conventional DDQN algorithm and its improved algorithm in terms of various evaluation indicators in the two simulation environments and effectively overcomes the problems of incomplete path search and slow convergence speed in the same complex environment.
KW - competitive network structure
KW - deep learning
KW - hierarchical deep reinforcement learning
KW - priority experience playback
KW - reinforcement learning
KW - robot path planning
UR - https://www.scopus.com/pages/publications/85199412822
U2 - 10.13976/j.cnki.xk.2024.3090
DO - 10.13976/j.cnki.xk.2024.3090
M3 - 文章
AN - SCOPUS:85199412822
SN - 1002-0411
VL - 53
SP - 365
EP - 376
JO - Information and Control
JF - Information and Control
IS - 3
ER -