TY - JOUR
T1 - Asynchronous curriculum experience replay
T2 - A deep reinforcement learning approach for uav autonomous motion control in unknown dynamic environments
AU - Hu, Zijian
AU - Gao, Xiaoguang
AU - Wan, Kaifang
AU - Wang, Qianglong
AU - Zhai, Yiwei
N1 - Publisher Copyright:
© 1967-2012 IEEE.
PY - 2023/11/1
Y1 - 2023/11/1
N2 - Unmanned aerial vehicles (UAVs) have been widely used in military warfare, and realizing safely autonomous motion control (AMC) in complex unknown environments is a challenge to face. In this article, we formulate the AMC problem as a Markov decision process (MDP) and propose an advanced deep reinforcement learning (DRL) method that allows UAVs to execute complex tasks in different environments. Aiming to overcome the limitations of the prioritized experience replay (PER), the proposed asynchronous curriculum experience replay (ACER) uses multithreads to asynchronously update the priorities and assigns the true priorities to increase the diversity of experiences. It also applies a temporary pool to enhance learning from new experiences and changes the fashion of experience pool to first-in-useless-out (FIUO) to make better use of old experiences. In addition, combined with curriculum learning (CL), a more reasonable training paradigm is designed for ACER to train UAV agents smoothly. By training in a large-scale dynamic environment constructed based on the parameters of a real UAV, ACER improves the convergence speed by 24.66% and the convergence result by 5.59% compared to the twin delayed deep deterministic policy gradient (TD3) algorithm. The testing experiments carried out in environments with different complexities further demonstrate the strong robustness and generalization ability of the ACER agents.
AB - Unmanned aerial vehicles (UAVs) have been widely used in military warfare, and realizing safely autonomous motion control (AMC) in complex unknown environments is a challenge to face. In this article, we formulate the AMC problem as a Markov decision process (MDP) and propose an advanced deep reinforcement learning (DRL) method that allows UAVs to execute complex tasks in different environments. Aiming to overcome the limitations of the prioritized experience replay (PER), the proposed asynchronous curriculum experience replay (ACER) uses multithreads to asynchronously update the priorities and assigns the true priorities to increase the diversity of experiences. It also applies a temporary pool to enhance learning from new experiences and changes the fashion of experience pool to first-in-useless-out (FIUO) to make better use of old experiences. In addition, combined with curriculum learning (CL), a more reasonable training paradigm is designed for ACER to train UAV agents smoothly. By training in a large-scale dynamic environment constructed based on the parameters of a real UAV, ACER improves the convergence speed by 24.66% and the convergence result by 5.59% compared to the twin delayed deep deterministic policy gradient (TD3) algorithm. The testing experiments carried out in environments with different complexities further demonstrate the strong robustness and generalization ability of the ACER agents.
KW - Autonomous motion control
KW - Curriculum learning
KW - Deep reinforcement learning
KW - Experience replay
KW - UAV
UR - http://www.scopus.com/inward/record.url?scp=85162691812&partnerID=8YFLogxK
U2 - 10.1109/TVT.2023.3285595
DO - 10.1109/TVT.2023.3285595
M3 - 文章
AN - SCOPUS:85162691812
SN - 0018-9545
VL - 72
SP - 13985
EP - 14001
JO - IEEE Transactions on Vehicular Technology
JF - IEEE Transactions on Vehicular Technology
IS - 11
ER -