Asynchronous curriculum experience replay: A deep reinforcement learning approach for uav autonomous motion control in unknown dynamic environments

Zijian Hu; Xiaoguang Gao; Kaifang Wan; Qianglong Wang; Yiwei Zhai

doi:10.1109/TVT.2023.3285595

Asynchronous curriculum experience replay: A deep reinforcement learning approach for uav autonomous motion control in unknown dynamic environments

Zijian Hu, Xiaoguang Gao, Kaifang Wan, Qianglong Wang, Yiwei Zhai

电子信息学院

Northwestern Polytechnical University Xian

科研成果: 期刊稿件 › 文章 › 同行评审

9 引用（Scopus）

摘要

Unmanned aerial vehicles (UAVs) have been widely used in military warfare, and realizing safely autonomous motion control (AMC) in complex unknown environments is a challenge to face. In this article, we formulate the AMC problem as a Markov decision process (MDP) and propose an advanced deep reinforcement learning (DRL) method that allows UAVs to execute complex tasks in different environments. Aiming to overcome the limitations of the prioritized experience replay (PER), the proposed asynchronous curriculum experience replay (ACER) uses multithreads to asynchronously update the priorities and assigns the true priorities to increase the diversity of experiences. It also applies a temporary pool to enhance learning from new experiences and changes the fashion of experience pool to first-in-useless-out (FIUO) to make better use of old experiences. In addition, combined with curriculum learning (CL), a more reasonable training paradigm is designed for ACER to train UAV agents smoothly. By training in a large-scale dynamic environment constructed based on the parameters of a real UAV, ACER improves the convergence speed by 24.66% and the convergence result by 5.59% compared to the twin delayed deep deterministic policy gradient (TD3) algorithm. The testing experiments carried out in environments with different complexities further demonstrate the strong robustness and generalization ability of the ACER agents.

源语言	英语
页（从-至）	13985-14001
页数	17
期刊	IEEE Transactions on Vehicular Technology
卷	72
期	11
DOI	https://doi.org/10.1109/TVT.2023.3285595
出版状态	已出版 - 1 11月 2023

访问文件

10.1109/TVT.2023.3285595

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{c520cedb61c747619e1d1b11201c44cd,

title = "Asynchronous curriculum experience replay: A deep reinforcement learning approach for uav autonomous motion control in unknown dynamic environments",

abstract = "Unmanned aerial vehicles (UAVs) have been widely used in military warfare, and realizing safely autonomous motion control (AMC) in complex unknown environments is a challenge to face. In this article, we formulate the AMC problem as a Markov decision process (MDP) and propose an advanced deep reinforcement learning (DRL) method that allows UAVs to execute complex tasks in different environments. Aiming to overcome the limitations of the prioritized experience replay (PER), the proposed asynchronous curriculum experience replay (ACER) uses multithreads to asynchronously update the priorities and assigns the true priorities to increase the diversity of experiences. It also applies a temporary pool to enhance learning from new experiences and changes the fashion of experience pool to first-in-useless-out (FIUO) to make better use of old experiences. In addition, combined with curriculum learning (CL), a more reasonable training paradigm is designed for ACER to train UAV agents smoothly. By training in a large-scale dynamic environment constructed based on the parameters of a real UAV, ACER improves the convergence speed by 24.66% and the convergence result by 5.59% compared to the twin delayed deep deterministic policy gradient (TD3) algorithm. The testing experiments carried out in environments with different complexities further demonstrate the strong robustness and generalization ability of the ACER agents.",

keywords = "Autonomous motion control, Curriculum learning, Deep reinforcement learning, Experience replay, UAV",

author = "Zijian Hu and Xiaoguang Gao and Kaifang Wan and Qianglong Wang and Yiwei Zhai",

note = "Publisher Copyright: {\textcopyright} 1967-2012 IEEE.",

year = "2023",

month = nov,

day = "1",

doi = "10.1109/TVT.2023.3285595",

language = "英语",

volume = "72",

pages = "13985--14001",

journal = "IEEE Transactions on Vehicular Technology",

issn = "0018-9545",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "11",

}

TY - JOUR

T1 - Asynchronous curriculum experience replay

T2 - A deep reinforcement learning approach for uav autonomous motion control in unknown dynamic environments

AU - Hu, Zijian

AU - Gao, Xiaoguang

AU - Wan, Kaifang

AU - Wang, Qianglong

AU - Zhai, Yiwei

PY - 2023/11/1

Y1 - 2023/11/1

N2 - Unmanned aerial vehicles (UAVs) have been widely used in military warfare, and realizing safely autonomous motion control (AMC) in complex unknown environments is a challenge to face. In this article, we formulate the AMC problem as a Markov decision process (MDP) and propose an advanced deep reinforcement learning (DRL) method that allows UAVs to execute complex tasks in different environments. Aiming to overcome the limitations of the prioritized experience replay (PER), the proposed asynchronous curriculum experience replay (ACER) uses multithreads to asynchronously update the priorities and assigns the true priorities to increase the diversity of experiences. It also applies a temporary pool to enhance learning from new experiences and changes the fashion of experience pool to first-in-useless-out (FIUO) to make better use of old experiences. In addition, combined with curriculum learning (CL), a more reasonable training paradigm is designed for ACER to train UAV agents smoothly. By training in a large-scale dynamic environment constructed based on the parameters of a real UAV, ACER improves the convergence speed by 24.66% and the convergence result by 5.59% compared to the twin delayed deep deterministic policy gradient (TD3) algorithm. The testing experiments carried out in environments with different complexities further demonstrate the strong robustness and generalization ability of the ACER agents.

AB - Unmanned aerial vehicles (UAVs) have been widely used in military warfare, and realizing safely autonomous motion control (AMC) in complex unknown environments is a challenge to face. In this article, we formulate the AMC problem as a Markov decision process (MDP) and propose an advanced deep reinforcement learning (DRL) method that allows UAVs to execute complex tasks in different environments. Aiming to overcome the limitations of the prioritized experience replay (PER), the proposed asynchronous curriculum experience replay (ACER) uses multithreads to asynchronously update the priorities and assigns the true priorities to increase the diversity of experiences. It also applies a temporary pool to enhance learning from new experiences and changes the fashion of experience pool to first-in-useless-out (FIUO) to make better use of old experiences. In addition, combined with curriculum learning (CL), a more reasonable training paradigm is designed for ACER to train UAV agents smoothly. By training in a large-scale dynamic environment constructed based on the parameters of a real UAV, ACER improves the convergence speed by 24.66% and the convergence result by 5.59% compared to the twin delayed deep deterministic policy gradient (TD3) algorithm. The testing experiments carried out in environments with different complexities further demonstrate the strong robustness and generalization ability of the ACER agents.

KW - Autonomous motion control

KW - Curriculum learning

KW - Deep reinforcement learning

KW - Experience replay

KW - UAV

UR - http://www.scopus.com/inward/record.url?scp=85162691812&partnerID=8YFLogxK

U2 - 10.1109/TVT.2023.3285595

DO - 10.1109/TVT.2023.3285595

M3 - 文章

AN - SCOPUS:85162691812

SN - 0018-9545

VL - 72

SP - 13985

EP - 14001

JO - IEEE Transactions on Vehicular Technology

JF - IEEE Transactions on Vehicular Technology

IS - 11

ER -

Asynchronous curriculum experience replay: A deep reinforcement learning approach for uav autonomous motion control in unknown dynamic environments

摘要

访问文件

其它文件与链接

指纹

引用此