TY - JOUR
T1 - 基于深度循环双 Q 网络的无人机避障算法研究
AU - Wei, Yao
AU - Liu, Zhicheng
AU - Cai, Bin
AU - Chen, Jiaxin
AU - Yang, Yao
AU - Zhang, Kai
N1 - Publisher Copyright:
©2022 Journal of Northwestern Polytechnical University.
PY - 2022/10
Y1 - 2022/10
N2 - The traditional reinforcement learning method has the problems of overestimation of value function and partial observability in the field of machine motion planning, especially in the obstacle avoidance problem of UAV, which lead to long training time and difficult convergence in the process of network training. This paper proposes an obstacle avoidance algorithm for UAVs based on a deep recurrent double Q network. By transforming the single-network structure into a dual-network structure, the optimal action selection and action value estimation are decoupled to reduce the overestimation of the value function. The fully connected layer introduces the GRU recurrent neural network module, and uses the GRU to process the time dimension information, enhance the analyzability of the real neural network, and improve the performance of the algorithm in some observable environments. On this basis, combining with the priority experience playback mechanism, the network convergence is accelerated. Finally, the original algorithm and the improved algorithm are tested in the simulation environment. The experimental results show that the algorithm has better performance in terms of training time, obstacle avoidance success rate and robustness.
AB - The traditional reinforcement learning method has the problems of overestimation of value function and partial observability in the field of machine motion planning, especially in the obstacle avoidance problem of UAV, which lead to long training time and difficult convergence in the process of network training. This paper proposes an obstacle avoidance algorithm for UAVs based on a deep recurrent double Q network. By transforming the single-network structure into a dual-network structure, the optimal action selection and action value estimation are decoupled to reduce the overestimation of the value function. The fully connected layer introduces the GRU recurrent neural network module, and uses the GRU to process the time dimension information, enhance the analyzability of the real neural network, and improve the performance of the algorithm in some observable environments. On this basis, combining with the priority experience playback mechanism, the network convergence is accelerated. Finally, the original algorithm and the improved algorithm are tested in the simulation environment. The experimental results show that the algorithm has better performance in terms of training time, obstacle avoidance success rate and robustness.
KW - DDQN
KW - deep reinforcement learning
KW - obstacle avoidance
KW - recurrent neural network
KW - UAV
UR - http://www.scopus.com/inward/record.url?scp=85143889633&partnerID=8YFLogxK
U2 - 10.1051/jnwpu/20224050970
DO - 10.1051/jnwpu/20224050970
M3 - 文章
AN - SCOPUS:85143889633
SN - 1000-2758
VL - 40
SP - 970
EP - 979
JO - Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University
JF - Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University
IS - 5
ER -