TY - JOUR
T1 - Research on multi-UAV task decision-making based on improved MADDPG algorithm and transfer learning
AU - Li, Bo
AU - Liang, Shiyang
AU - Gan, Zhigang
AU - Chen, Daqing
AU - Gao, Peixin
N1 - Publisher Copyright:
© 2021 Inderscience Enterprises Ltd.
PY - 2021
Y1 - 2021
N2 - At present, the intelligent algorithms of multi-UAV task decision-making have been suffering some major issues, such as, slow learning speed and poor generalisation capability, and these issues have made it difficult to obtain expected learning results within a reasonable time and to apply a trained model in a new environment. To address these problems, an improved algorithm, namely PMADDPG, based on multi-agent deep deterministic policy gradient (MADDPG) is proposed in this paper. This algorithm adopts a two-layer experience pool structure in order to achieve the priority experience replay. Experiences are stored in an experience pool of the first layer, and then, experiences more conducive to training and learning are selected according to priority criteria and put into an experience pool of the second layer. Furthermore, the experiences from the experience pool of the second layer are selected for model training based on PMADDPG algorithm. In addition, a model-based environment transfer learning method is designed to improve the generalisation capability of the algorithm. Comparative experiments have shown that, compared with MADDPG algorithm, proposed algorithms can scientifically improve the learning speed, task success rate and generalisation capability.
AB - At present, the intelligent algorithms of multi-UAV task decision-making have been suffering some major issues, such as, slow learning speed and poor generalisation capability, and these issues have made it difficult to obtain expected learning results within a reasonable time and to apply a trained model in a new environment. To address these problems, an improved algorithm, namely PMADDPG, based on multi-agent deep deterministic policy gradient (MADDPG) is proposed in this paper. This algorithm adopts a two-layer experience pool structure in order to achieve the priority experience replay. Experiences are stored in an experience pool of the first layer, and then, experiences more conducive to training and learning are selected according to priority criteria and put into an experience pool of the second layer. Furthermore, the experiences from the experience pool of the second layer are selected for model training based on PMADDPG algorithm. In addition, a model-based environment transfer learning method is designed to improve the generalisation capability of the algorithm. Comparative experiments have shown that, compared with MADDPG algorithm, proposed algorithms can scientifically improve the learning speed, task success rate and generalisation capability.
KW - Improved MADDPG algorithm
KW - Multi-UAV task decision
KW - Transfer learning
KW - Two-layer experience pool
UR - http://www.scopus.com/inward/record.url?scp=85117201924&partnerID=8YFLogxK
U2 - 10.1504/IJBIC.2021.118087
DO - 10.1504/IJBIC.2021.118087
M3 - 文章
AN - SCOPUS:85117201924
SN - 1758-0366
VL - 18
SP - 82
EP - 91
JO - International Journal of Bio-Inspired Computation
JF - International Journal of Bio-Inspired Computation
IS - 2
ER -