基于深度循环双 Q 网络的无人机避障算法研究

Yao Wei; Zhicheng Liu; Bin Cai; Jiaxin Chen; Yao Yang; Kai Zhang

doi:10.1051/jnwpu/20224050970

基于深度循环双 Q 网络的无人机避障算法研究

Yao Wei, Zhicheng Liu, Bin Cai, Jiaxin Chen, Yao Yang, Kai Zhang

民航学院

科研成果: 期刊稿件 › 文章 › 同行评审

2 引用（Scopus）

摘要

The traditional reinforcement learning method has the problems of overestimation of value function and partial observability in the field of machine motion planning, especially in the obstacle avoidance problem of UAV, which lead to long training time and difficult convergence in the process of network training. This paper proposes an obstacle avoidance algorithm for UAVs based on a deep recurrent double Q network. By transforming the single-network structure into a dual-network structure, the optimal action selection and action value estimation are decoupled to reduce the overestimation of the value function. The fully connected layer introduces the GRU recurrent neural network module, and uses the GRU to process the time dimension information, enhance the analyzability of the real neural network, and improve the performance of the algorithm in some observable environments. On this basis, combining with the priority experience playback mechanism, the network convergence is accelerated. Finally, the original algorithm and the improved algorithm are tested in the simulation environment. The experimental results show that the algorithm has better performance in terms of training time, obstacle avoidance success rate and robustness.

投稿的翻译标题	Study on UAV obstacle avoidance algorithm based on deep recurrent double Q network
源语言	繁体中文
页（从-至）	970-979
页数	10
期刊	Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University
卷	40
期	5
DOI	https://doi.org/10.1051/jnwpu/20224050970
出版状态	已出版 - 10月 2022

关键词

DDQN
deep reinforcement learning
obstacle avoidance
recurrent neural network
UAV

访问文件

10.1051/jnwpu/20224050970

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{60adc942ae4442568cb8dbc687eda314,

title = "基于深度循环双 Q 网络的无人机避障算法研究",

abstract = "The traditional reinforcement learning method has the problems of overestimation of value function and partial observability in the field of machine motion planning, especially in the obstacle avoidance problem of UAV, which lead to long training time and difficult convergence in the process of network training. This paper proposes an obstacle avoidance algorithm for UAVs based on a deep recurrent double Q network. By transforming the single-network structure into a dual-network structure, the optimal action selection and action value estimation are decoupled to reduce the overestimation of the value function. The fully connected layer introduces the GRU recurrent neural network module, and uses the GRU to process the time dimension information, enhance the analyzability of the real neural network, and improve the performance of the algorithm in some observable environments. On this basis, combining with the priority experience playback mechanism, the network convergence is accelerated. Finally, the original algorithm and the improved algorithm are tested in the simulation environment. The experimental results show that the algorithm has better performance in terms of training time, obstacle avoidance success rate and robustness.",

keywords = "DDQN, deep reinforcement learning, obstacle avoidance, recurrent neural network, UAV",

author = "Yao Wei and Zhicheng Liu and Bin Cai and Jiaxin Chen and Yao Yang and Kai Zhang",

note = "Publisher Copyright: {\textcopyright}2022 Journal of Northwestern Polytechnical University.",

year = "2022",

month = oct,

doi = "10.1051/jnwpu/20224050970",

language = "繁体中文",

volume = "40",

pages = "970--979",

journal = "Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University",

issn = "1000-2758",

publisher = "Northwestern Polytechnical University",

number = "5",

}

TY - JOUR

T1 - 基于深度循环双 Q 网络的无人机避障算法研究

AU - Wei, Yao

AU - Liu, Zhicheng

AU - Cai, Bin

AU - Chen, Jiaxin

AU - Yang, Yao

AU - Zhang, Kai

PY - 2022/10

Y1 - 2022/10

N2 - The traditional reinforcement learning method has the problems of overestimation of value function and partial observability in the field of machine motion planning, especially in the obstacle avoidance problem of UAV, which lead to long training time and difficult convergence in the process of network training. This paper proposes an obstacle avoidance algorithm for UAVs based on a deep recurrent double Q network. By transforming the single-network structure into a dual-network structure, the optimal action selection and action value estimation are decoupled to reduce the overestimation of the value function. The fully connected layer introduces the GRU recurrent neural network module, and uses the GRU to process the time dimension information, enhance the analyzability of the real neural network, and improve the performance of the algorithm in some observable environments. On this basis, combining with the priority experience playback mechanism, the network convergence is accelerated. Finally, the original algorithm and the improved algorithm are tested in the simulation environment. The experimental results show that the algorithm has better performance in terms of training time, obstacle avoidance success rate and robustness.

AB - The traditional reinforcement learning method has the problems of overestimation of value function and partial observability in the field of machine motion planning, especially in the obstacle avoidance problem of UAV, which lead to long training time and difficult convergence in the process of network training. This paper proposes an obstacle avoidance algorithm for UAVs based on a deep recurrent double Q network. By transforming the single-network structure into a dual-network structure, the optimal action selection and action value estimation are decoupled to reduce the overestimation of the value function. The fully connected layer introduces the GRU recurrent neural network module, and uses the GRU to process the time dimension information, enhance the analyzability of the real neural network, and improve the performance of the algorithm in some observable environments. On this basis, combining with the priority experience playback mechanism, the network convergence is accelerated. Finally, the original algorithm and the improved algorithm are tested in the simulation environment. The experimental results show that the algorithm has better performance in terms of training time, obstacle avoidance success rate and robustness.

KW - DDQN

KW - deep reinforcement learning

KW - obstacle avoidance

KW - recurrent neural network

KW - UAV

UR - http://www.scopus.com/inward/record.url?scp=85143889633&partnerID=8YFLogxK

U2 - 10.1051/jnwpu/20224050970

DO - 10.1051/jnwpu/20224050970

M3 - 文章

AN - SCOPUS:85143889633

SN - 1000-2758

VL - 40

SP - 970

EP - 979

JO - Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University

JF - Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University

IS - 5

ER -

基于深度循环双 Q 网络的无人机避障算法研究

摘要

关键词

访问文件

其它文件与链接

指纹

引用此