Research on multi-UAV task decision-making based on improved MADDPG algorithm and transfer learning

Bo Li; Shiyang Liang; Zhigang Gan; Daqing Chen; Peixin Gao

doi:10.1504/IJBIC.2021.118087

Research on multi-UAV task decision-making based on improved MADDPG algorithm and transfer learning

Bo Li, Shiyang Liang, Zhigang Gan, Daqing Chen, Peixin Gao

School of Electronics and Information

Research output: Contribution to journal › Article › peer-review

16 Scopus citations

Abstract

At present, the intelligent algorithms of multi-UAV task decision-making have been suffering some major issues, such as, slow learning speed and poor generalisation capability, and these issues have made it difficult to obtain expected learning results within a reasonable time and to apply a trained model in a new environment. To address these problems, an improved algorithm, namely PMADDPG, based on multi-agent deep deterministic policy gradient (MADDPG) is proposed in this paper. This algorithm adopts a two-layer experience pool structure in order to achieve the priority experience replay. Experiences are stored in an experience pool of the first layer, and then, experiences more conducive to training and learning are selected according to priority criteria and put into an experience pool of the second layer. Furthermore, the experiences from the experience pool of the second layer are selected for model training based on PMADDPG algorithm. In addition, a model-based environment transfer learning method is designed to improve the generalisation capability of the algorithm. Comparative experiments have shown that, compared with MADDPG algorithm, proposed algorithms can scientifically improve the learning speed, task success rate and generalisation capability.

Original language	English
Pages (from-to)	82-91
Number of pages	10
Journal	International Journal of Bio-Inspired Computation
Volume	18
Issue number	2
DOIs	https://doi.org/10.1504/IJBIC.2021.118087
State	Published - 2021

Keywords

Improved MADDPG algorithm
Multi-UAV task decision
Transfer learning
Two-layer experience pool

Access to Document

10.1504/IJBIC.2021.118087

Cite this

@article{d3d07a3463f64310b1c9019750e2dca3,

title = "Research on multi-UAV task decision-making based on improved MADDPG algorithm and transfer learning",

abstract = "At present, the intelligent algorithms of multi-UAV task decision-making have been suffering some major issues, such as, slow learning speed and poor generalisation capability, and these issues have made it difficult to obtain expected learning results within a reasonable time and to apply a trained model in a new environment. To address these problems, an improved algorithm, namely PMADDPG, based on multi-agent deep deterministic policy gradient (MADDPG) is proposed in this paper. This algorithm adopts a two-layer experience pool structure in order to achieve the priority experience replay. Experiences are stored in an experience pool of the first layer, and then, experiences more conducive to training and learning are selected according to priority criteria and put into an experience pool of the second layer. Furthermore, the experiences from the experience pool of the second layer are selected for model training based on PMADDPG algorithm. In addition, a model-based environment transfer learning method is designed to improve the generalisation capability of the algorithm. Comparative experiments have shown that, compared with MADDPG algorithm, proposed algorithms can scientifically improve the learning speed, task success rate and generalisation capability.",

keywords = "Improved MADDPG algorithm, Multi-UAV task decision, Transfer learning, Two-layer experience pool",

author = "Bo Li and Shiyang Liang and Zhigang Gan and Daqing Chen and Peixin Gao",

note = "Publisher Copyright: {\textcopyright} 2021 Inderscience Enterprises Ltd.",

year = "2021",

doi = "10.1504/IJBIC.2021.118087",

language = "英语",

volume = "18",

pages = "82--91",

journal = "International Journal of Bio-Inspired Computation",

issn = "1758-0366",

publisher = "Inderscience",

number = "2",

}

TY - JOUR

T1 - Research on multi-UAV task decision-making based on improved MADDPG algorithm and transfer learning

AU - Li, Bo

AU - Liang, Shiyang

AU - Gan, Zhigang

AU - Chen, Daqing

AU - Gao, Peixin

PY - 2021

Y1 - 2021

N2 - At present, the intelligent algorithms of multi-UAV task decision-making have been suffering some major issues, such as, slow learning speed and poor generalisation capability, and these issues have made it difficult to obtain expected learning results within a reasonable time and to apply a trained model in a new environment. To address these problems, an improved algorithm, namely PMADDPG, based on multi-agent deep deterministic policy gradient (MADDPG) is proposed in this paper. This algorithm adopts a two-layer experience pool structure in order to achieve the priority experience replay. Experiences are stored in an experience pool of the first layer, and then, experiences more conducive to training and learning are selected according to priority criteria and put into an experience pool of the second layer. Furthermore, the experiences from the experience pool of the second layer are selected for model training based on PMADDPG algorithm. In addition, a model-based environment transfer learning method is designed to improve the generalisation capability of the algorithm. Comparative experiments have shown that, compared with MADDPG algorithm, proposed algorithms can scientifically improve the learning speed, task success rate and generalisation capability.

AB - At present, the intelligent algorithms of multi-UAV task decision-making have been suffering some major issues, such as, slow learning speed and poor generalisation capability, and these issues have made it difficult to obtain expected learning results within a reasonable time and to apply a trained model in a new environment. To address these problems, an improved algorithm, namely PMADDPG, based on multi-agent deep deterministic policy gradient (MADDPG) is proposed in this paper. This algorithm adopts a two-layer experience pool structure in order to achieve the priority experience replay. Experiences are stored in an experience pool of the first layer, and then, experiences more conducive to training and learning are selected according to priority criteria and put into an experience pool of the second layer. Furthermore, the experiences from the experience pool of the second layer are selected for model training based on PMADDPG algorithm. In addition, a model-based environment transfer learning method is designed to improve the generalisation capability of the algorithm. Comparative experiments have shown that, compared with MADDPG algorithm, proposed algorithms can scientifically improve the learning speed, task success rate and generalisation capability.

KW - Improved MADDPG algorithm

KW - Multi-UAV task decision

KW - Transfer learning

KW - Two-layer experience pool

UR - http://www.scopus.com/inward/record.url?scp=85117201924&partnerID=8YFLogxK

U2 - 10.1504/IJBIC.2021.118087

DO - 10.1504/IJBIC.2021.118087

M3 - 文章

AN - SCOPUS:85117201924

SN - 1758-0366

VL - 18

SP - 82

EP - 91

JO - International Journal of Bio-Inspired Computation

JF - International Journal of Bio-Inspired Computation

IS - 2

ER -

Research on multi-UAV task decision-making based on improved MADDPG algorithm and transfer learning

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this