Relevant experience learning: A deep reinforcement learning method for UAV autonomous motion planning in complex unknown environments

Zijian HU; Xiaoguang GAO; Kaifang WAN; Yiwei ZHAI; Qianglong WANG

doi:10.1016/j.cja.2020.12.027

Relevant experience learning: A deep reinforcement learning method for UAV autonomous motion planning in complex unknown environments

Zijian HU, Xiaoguang GAO, Kaifang WAN, Yiwei ZHAI, Qianglong WANG

School of Electronics and Information

Northwestern Polytechnical University Xian

Research output: Contribution to journal › Article › peer-review

69 Scopus citations

Abstract

Unmanned Aerial Vehicles (UAVs) play a vital role in military warfare. In a variety of battlefield mission scenarios, UAVs are required to safely fly to designated locations without human intervention. Therefore, finding a suitable method to solve the UAV Autonomous Motion Planning (AMP) problem can improve the success rate of UAV missions to a certain extent. In recent years, many studies have used Deep Reinforcement Learning (DRL) methods to address the AMP problem and have achieved good results. From the perspective of sampling, this paper designs a sampling method with double-screening, combines it with the Deep Deterministic Policy Gradient (DDPG) algorithm, and proposes the Relevant Experience Learning-DDPG (REL-DDPG) algorithm. The REL-DDPG algorithm uses a Prioritized Experience Replay (PER) mechanism to break the correlation of continuous experiences in the experience pool, finds the experiences most similar to the current state to learn according to the theory in human education, and expands the influence of the learning process on action selection at the current state. All experiments are applied in a complex unknown simulation environment constructed based on the parameters of a real UAV. The training experiments show that REL-DDPG improves the convergence speed and the convergence result compared to the state-of-the-art DDPG algorithm, while the testing experiments show the applicability of the algorithm and investigate the performance under different parameter conditions.

Original language	English
Pages (from-to)	187-204
Number of pages	18
Journal	Chinese Journal of Aeronautics
Volume	34
Issue number	12
DOIs	https://doi.org/10.1016/j.cja.2020.12.027
State	Published - Dec 2021

Keywords

Autonomous Motion Planning (AMP)
Deep Deterministic Policy Gradient (DDPG)
Deep Reinforcement Learning (DRL)
Sampling method
UAV

Access to Document

10.1016/j.cja.2020.12.027

Cite this

@article{f541641fd1ae438a99d42c45184c52e2,

title = "Relevant experience learning: A deep reinforcement learning method for UAV autonomous motion planning in complex unknown environments",

abstract = "Unmanned Aerial Vehicles (UAVs) play a vital role in military warfare. In a variety of battlefield mission scenarios, UAVs are required to safely fly to designated locations without human intervention. Therefore, finding a suitable method to solve the UAV Autonomous Motion Planning (AMP) problem can improve the success rate of UAV missions to a certain extent. In recent years, many studies have used Deep Reinforcement Learning (DRL) methods to address the AMP problem and have achieved good results. From the perspective of sampling, this paper designs a sampling method with double-screening, combines it with the Deep Deterministic Policy Gradient (DDPG) algorithm, and proposes the Relevant Experience Learning-DDPG (REL-DDPG) algorithm. The REL-DDPG algorithm uses a Prioritized Experience Replay (PER) mechanism to break the correlation of continuous experiences in the experience pool, finds the experiences most similar to the current state to learn according to the theory in human education, and expands the influence of the learning process on action selection at the current state. All experiments are applied in a complex unknown simulation environment constructed based on the parameters of a real UAV. The training experiments show that REL-DDPG improves the convergence speed and the convergence result compared to the state-of-the-art DDPG algorithm, while the testing experiments show the applicability of the algorithm and investigate the performance under different parameter conditions.",

keywords = "Autonomous Motion Planning (AMP), Deep Deterministic Policy Gradient (DDPG), Deep Reinforcement Learning (DRL), Sampling method, UAV",

author = "Zijian HU and Xiaoguang GAO and Kaifang WAN and Yiwei ZHAI and Qianglong WANG",

note = "Publisher Copyright: {\textcopyright} 2021 Chinese Society of Aeronautics and Astronautics",

year = "2021",

month = dec,

doi = "10.1016/j.cja.2020.12.027",

language = "英语",

volume = "34",

pages = "187--204",

journal = "Chinese Journal of Aeronautics",

issn = "1000-9361",

publisher = "Elsevier B.V.",

number = "12",

}

TY - JOUR

T1 - Relevant experience learning

T2 - A deep reinforcement learning method for UAV autonomous motion planning in complex unknown environments

AU - HU, Zijian

AU - GAO, Xiaoguang

AU - WAN, Kaifang

AU - ZHAI, Yiwei

AU - WANG, Qianglong

PY - 2021/12

Y1 - 2021/12

N2 - Unmanned Aerial Vehicles (UAVs) play a vital role in military warfare. In a variety of battlefield mission scenarios, UAVs are required to safely fly to designated locations without human intervention. Therefore, finding a suitable method to solve the UAV Autonomous Motion Planning (AMP) problem can improve the success rate of UAV missions to a certain extent. In recent years, many studies have used Deep Reinforcement Learning (DRL) methods to address the AMP problem and have achieved good results. From the perspective of sampling, this paper designs a sampling method with double-screening, combines it with the Deep Deterministic Policy Gradient (DDPG) algorithm, and proposes the Relevant Experience Learning-DDPG (REL-DDPG) algorithm. The REL-DDPG algorithm uses a Prioritized Experience Replay (PER) mechanism to break the correlation of continuous experiences in the experience pool, finds the experiences most similar to the current state to learn according to the theory in human education, and expands the influence of the learning process on action selection at the current state. All experiments are applied in a complex unknown simulation environment constructed based on the parameters of a real UAV. The training experiments show that REL-DDPG improves the convergence speed and the convergence result compared to the state-of-the-art DDPG algorithm, while the testing experiments show the applicability of the algorithm and investigate the performance under different parameter conditions.

AB - Unmanned Aerial Vehicles (UAVs) play a vital role in military warfare. In a variety of battlefield mission scenarios, UAVs are required to safely fly to designated locations without human intervention. Therefore, finding a suitable method to solve the UAV Autonomous Motion Planning (AMP) problem can improve the success rate of UAV missions to a certain extent. In recent years, many studies have used Deep Reinforcement Learning (DRL) methods to address the AMP problem and have achieved good results. From the perspective of sampling, this paper designs a sampling method with double-screening, combines it with the Deep Deterministic Policy Gradient (DDPG) algorithm, and proposes the Relevant Experience Learning-DDPG (REL-DDPG) algorithm. The REL-DDPG algorithm uses a Prioritized Experience Replay (PER) mechanism to break the correlation of continuous experiences in the experience pool, finds the experiences most similar to the current state to learn according to the theory in human education, and expands the influence of the learning process on action selection at the current state. All experiments are applied in a complex unknown simulation environment constructed based on the parameters of a real UAV. The training experiments show that REL-DDPG improves the convergence speed and the convergence result compared to the state-of-the-art DDPG algorithm, while the testing experiments show the applicability of the algorithm and investigate the performance under different parameter conditions.

KW - Autonomous Motion Planning (AMP)

KW - Deep Deterministic Policy Gradient (DDPG)

KW - Deep Reinforcement Learning (DRL)

KW - Sampling method

KW - UAV

UR - http://www.scopus.com/inward/record.url?scp=85109444094&partnerID=8YFLogxK

U2 - 10.1016/j.cja.2020.12.027

DO - 10.1016/j.cja.2020.12.027

M3 - 文章

AN - SCOPUS:85109444094

SN - 1000-9361

VL - 34

SP - 187

EP - 204

JO - Chinese Journal of Aeronautics

JF - Chinese Journal of Aeronautics

IS - 12

ER -

Relevant experience learning: A deep reinforcement learning method for UAV autonomous motion planning in complex unknown environments

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this