Research on multi-UAV task decision-making based on improved MADDPG algorithm and transfer learning

Bo Li; Shiyang Liang; Zhigang Gan; Daqing Chen; Peixin Gao

doi:10.1504/IJBIC.2021.118087

Research on multi-UAV task decision-making based on improved MADDPG algorithm and transfer learning

Bo Li, Shiyang Liang, Zhigang Gan, Daqing Chen, Peixin Gao

电子信息学院

科研成果: 期刊稿件 › 文章 › 同行评审

16 引用（Scopus）

摘要

At present, the intelligent algorithms of multi-UAV task decision-making have been suffering some major issues, such as, slow learning speed and poor generalisation capability, and these issues have made it difficult to obtain expected learning results within a reasonable time and to apply a trained model in a new environment. To address these problems, an improved algorithm, namely PMADDPG, based on multi-agent deep deterministic policy gradient (MADDPG) is proposed in this paper. This algorithm adopts a two-layer experience pool structure in order to achieve the priority experience replay. Experiences are stored in an experience pool of the first layer, and then, experiences more conducive to training and learning are selected according to priority criteria and put into an experience pool of the second layer. Furthermore, the experiences from the experience pool of the second layer are selected for model training based on PMADDPG algorithm. In addition, a model-based environment transfer learning method is designed to improve the generalisation capability of the algorithm. Comparative experiments have shown that, compared with MADDPG algorithm, proposed algorithms can scientifically improve the learning speed, task success rate and generalisation capability.

源语言	英语
页（从-至）	82-91
页数	10
期刊	International Journal of Bio-Inspired Computation
卷	18
期	2
DOI	https://doi.org/10.1504/IJBIC.2021.118087
出版状态	已出版 - 2021

访问文件

10.1504/IJBIC.2021.118087

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{d3d07a3463f64310b1c9019750e2dca3,

title = "Research on multi-UAV task decision-making based on improved MADDPG algorithm and transfer learning",

abstract = "At present, the intelligent algorithms of multi-UAV task decision-making have been suffering some major issues, such as, slow learning speed and poor generalisation capability, and these issues have made it difficult to obtain expected learning results within a reasonable time and to apply a trained model in a new environment. To address these problems, an improved algorithm, namely PMADDPG, based on multi-agent deep deterministic policy gradient (MADDPG) is proposed in this paper. This algorithm adopts a two-layer experience pool structure in order to achieve the priority experience replay. Experiences are stored in an experience pool of the first layer, and then, experiences more conducive to training and learning are selected according to priority criteria and put into an experience pool of the second layer. Furthermore, the experiences from the experience pool of the second layer are selected for model training based on PMADDPG algorithm. In addition, a model-based environment transfer learning method is designed to improve the generalisation capability of the algorithm. Comparative experiments have shown that, compared with MADDPG algorithm, proposed algorithms can scientifically improve the learning speed, task success rate and generalisation capability.",

keywords = "Improved MADDPG algorithm, Multi-UAV task decision, Transfer learning, Two-layer experience pool",

author = "Bo Li and Shiyang Liang and Zhigang Gan and Daqing Chen and Peixin Gao",

note = "Publisher Copyright: {\textcopyright} 2021 Inderscience Enterprises Ltd.",

year = "2021",

doi = "10.1504/IJBIC.2021.118087",

language = "英语",

volume = "18",

pages = "82--91",

journal = "International Journal of Bio-Inspired Computation",

issn = "1758-0366",

publisher = "Inderscience",

number = "2",

}

TY - JOUR

T1 - Research on multi-UAV task decision-making based on improved MADDPG algorithm and transfer learning

AU - Li, Bo

AU - Liang, Shiyang

AU - Gan, Zhigang

AU - Chen, Daqing

AU - Gao, Peixin

PY - 2021

Y1 - 2021

N2 - At present, the intelligent algorithms of multi-UAV task decision-making have been suffering some major issues, such as, slow learning speed and poor generalisation capability, and these issues have made it difficult to obtain expected learning results within a reasonable time and to apply a trained model in a new environment. To address these problems, an improved algorithm, namely PMADDPG, based on multi-agent deep deterministic policy gradient (MADDPG) is proposed in this paper. This algorithm adopts a two-layer experience pool structure in order to achieve the priority experience replay. Experiences are stored in an experience pool of the first layer, and then, experiences more conducive to training and learning are selected according to priority criteria and put into an experience pool of the second layer. Furthermore, the experiences from the experience pool of the second layer are selected for model training based on PMADDPG algorithm. In addition, a model-based environment transfer learning method is designed to improve the generalisation capability of the algorithm. Comparative experiments have shown that, compared with MADDPG algorithm, proposed algorithms can scientifically improve the learning speed, task success rate and generalisation capability.

AB - At present, the intelligent algorithms of multi-UAV task decision-making have been suffering some major issues, such as, slow learning speed and poor generalisation capability, and these issues have made it difficult to obtain expected learning results within a reasonable time and to apply a trained model in a new environment. To address these problems, an improved algorithm, namely PMADDPG, based on multi-agent deep deterministic policy gradient (MADDPG) is proposed in this paper. This algorithm adopts a two-layer experience pool structure in order to achieve the priority experience replay. Experiences are stored in an experience pool of the first layer, and then, experiences more conducive to training and learning are selected according to priority criteria and put into an experience pool of the second layer. Furthermore, the experiences from the experience pool of the second layer are selected for model training based on PMADDPG algorithm. In addition, a model-based environment transfer learning method is designed to improve the generalisation capability of the algorithm. Comparative experiments have shown that, compared with MADDPG algorithm, proposed algorithms can scientifically improve the learning speed, task success rate and generalisation capability.

KW - Improved MADDPG algorithm

KW - Multi-UAV task decision

KW - Transfer learning

KW - Two-layer experience pool

UR - http://www.scopus.com/inward/record.url?scp=85117201924&partnerID=8YFLogxK

U2 - 10.1504/IJBIC.2021.118087

DO - 10.1504/IJBIC.2021.118087

M3 - 文章

AN - SCOPUS:85117201924

SN - 1758-0366

VL - 18

SP - 82

EP - 91

JO - International Journal of Bio-Inspired Computation

JF - International Journal of Bio-Inspired Computation

IS - 2

ER -

Research on multi-UAV task decision-making based on improved MADDPG algorithm and transfer learning

摘要

访问文件

其它文件与链接

指纹

引用此