TY - GEN
T1 - Path Planning Technology of Unmanned Vehicle Based on Improved Deep Reinforcement Learning
AU - Zhang, Kai
AU - Wang, Luhe
AU - Hu, Jinwen
AU - Xu, Zhao
AU - Guo, Chubing
N1 - Publisher Copyright:
© 2021 Technical Committee on Control Theory, Chinese Association of Automation.
PY - 2021/7/26
Y1 - 2021/7/26
N2 - As the basic problem of unmanned vehicle navigation control, path planning has been widely studied. Reinforcement learning (RL) has been found an effective way of path optimization for the highly nonlinear and unmodeled dynamics. However, the RL based methods suffer from the "dimension disaster"under the high-dimension state spaces. In this paper, the path planning of an unmanned vehicle with collision avoidance is considered, and an improved Deep Q-Network (DQN) algorithm is proposed to reduce the computation load in the high-dimension state space. First, the states, actions and rewards are determined based on the task requirement, and a smoothing function is defined as an additional penalty term to modify the basic reward function. Then, the two-dimension grid of the state space is mapped to a gray image, which is applied as the input of a neural network, i.e., the Q-Network. Finally, simulation results show that the modified DQN algorithm is more stable and the fluctuation frequency is significantly reduced.
AB - As the basic problem of unmanned vehicle navigation control, path planning has been widely studied. Reinforcement learning (RL) has been found an effective way of path optimization for the highly nonlinear and unmodeled dynamics. However, the RL based methods suffer from the "dimension disaster"under the high-dimension state spaces. In this paper, the path planning of an unmanned vehicle with collision avoidance is considered, and an improved Deep Q-Network (DQN) algorithm is proposed to reduce the computation load in the high-dimension state space. First, the states, actions and rewards are determined based on the task requirement, and a smoothing function is defined as an additional penalty term to modify the basic reward function. Then, the two-dimension grid of the state space is mapped to a gray image, which is applied as the input of a neural network, i.e., the Q-Network. Finally, simulation results show that the modified DQN algorithm is more stable and the fluctuation frequency is significantly reduced.
KW - DQN
KW - path planning
KW - Reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=85117292825&partnerID=8YFLogxK
U2 - 10.23919/CCC52363.2021.9549620
DO - 10.23919/CCC52363.2021.9549620
M3 - 会议稿件
AN - SCOPUS:85117292825
T3 - Chinese Control Conference, CCC
SP - 8392
EP - 8397
BT - Proceedings of the 40th Chinese Control Conference, CCC 2021
A2 - Peng, Chen
A2 - Sun, Jian
PB - IEEE Computer Society
T2 - 40th Chinese Control Conference, CCC 2021
Y2 - 26 July 2021 through 28 July 2021
ER -