Multiple-UAV Reinforcement Learning Algorithm Based on Improved PPO in Ray Framework

Guang Zhan; Xinmiao Zhang; Zhongchao Li; Lin Xu; Deyun Zhou; Zhen Yang

doi:10.3390/drones6070166

Multiple-UAV Reinforcement Learning Algorithm Based on Improved PPO in Ray Framework

Guang Zhan, Xinmiao Zhang, Zhongchao Li, Lin Xu, Deyun Zhou, Zhen Yang

School of Electronics and Information

Research output: Contribution to journal › Article › peer-review

42 Scopus citations

Abstract

Distributed multi-agent collaborative decision-making technology is the key to general artificial intelligence. This paper takes the self-developed Unity3D collaborative combat environment as the test scenario, setting a task that requires heterogeneous unmanned aerial vehicles (UAVs) to perform a distributed decision-making and complete cooperation task. Aiming at the problem of the traditional proximal policy optimization (PPO) algorithm’s poor performance in the field of complex multi-agent collaboration scenarios based on the distributed training framework Ray, the Critic network in the PPO algorithm is improved to learn a centralized value function, and the muti-agent proximal policy optimization (MAPPO) algorithm is proposed. At the same time, the inheritance training method based on course learning is adopted to improve the generalization performance of the algorithm. In the experiment, MAPPO can obtain the highest average accumulate reward compared with other algorithms and can complete the task goal with the fewest steps after convergence, which fully demonstrates that the MAPPO algorithm outperforms the state-of-the-art.

Original language	English
Article number	166
Journal	Drones
Volume	6
Issue number	7
DOIs	https://doi.org/10.3390/drones6070166
State	Published - Jul 2022

Keywords

PPO
Ray
curriculum learning
deep reinforcement learning
multiple UAVs

Access to Document

10.3390/drones6070166

Cite this

@article{372a1eefbe254ed6a8aadf1c4acb55bb,

title = "Multiple-UAV Reinforcement Learning Algorithm Based on Improved PPO in Ray Framework",

abstract = "Distributed multi-agent collaborative decision-making technology is the key to general artificial intelligence. This paper takes the self-developed Unity3D collaborative combat environment as the test scenario, setting a task that requires heterogeneous unmanned aerial vehicles (UAVs) to perform a distributed decision-making and complete cooperation task. Aiming at the problem of the traditional proximal policy optimization (PPO) algorithm{\textquoteright}s poor performance in the field of complex multi-agent collaboration scenarios based on the distributed training framework Ray, the Critic network in the PPO algorithm is improved to learn a centralized value function, and the muti-agent proximal policy optimization (MAPPO) algorithm is proposed. At the same time, the inheritance training method based on course learning is adopted to improve the generalization performance of the algorithm. In the experiment, MAPPO can obtain the highest average accumulate reward compared with other algorithms and can complete the task goal with the fewest steps after convergence, which fully demonstrates that the MAPPO algorithm outperforms the state-of-the-art.",

keywords = "PPO, Ray, curriculum learning, deep reinforcement learning, multiple UAVs",

author = "Guang Zhan and Xinmiao Zhang and Zhongchao Li and Lin Xu and Deyun Zhou and Zhen Yang",

note = "Publisher Copyright: {\textcopyright} 2022 by the authors. Licensee MDPI, Basel, Switzerland.",

year = "2022",

month = jul,

doi = "10.3390/drones6070166",

language = "英语",

volume = "6",

journal = "Drones",

issn = "2504-446X",

publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",

number = "7",

}

TY - JOUR

T1 - Multiple-UAV Reinforcement Learning Algorithm Based on Improved PPO in Ray Framework

AU - Zhan, Guang

AU - Zhang, Xinmiao

AU - Li, Zhongchao

AU - Xu, Lin

AU - Zhou, Deyun

AU - Yang, Zhen

PY - 2022/7

Y1 - 2022/7

N2 - Distributed multi-agent collaborative decision-making technology is the key to general artificial intelligence. This paper takes the self-developed Unity3D collaborative combat environment as the test scenario, setting a task that requires heterogeneous unmanned aerial vehicles (UAVs) to perform a distributed decision-making and complete cooperation task. Aiming at the problem of the traditional proximal policy optimization (PPO) algorithm’s poor performance in the field of complex multi-agent collaboration scenarios based on the distributed training framework Ray, the Critic network in the PPO algorithm is improved to learn a centralized value function, and the muti-agent proximal policy optimization (MAPPO) algorithm is proposed. At the same time, the inheritance training method based on course learning is adopted to improve the generalization performance of the algorithm. In the experiment, MAPPO can obtain the highest average accumulate reward compared with other algorithms and can complete the task goal with the fewest steps after convergence, which fully demonstrates that the MAPPO algorithm outperforms the state-of-the-art.

AB - Distributed multi-agent collaborative decision-making technology is the key to general artificial intelligence. This paper takes the self-developed Unity3D collaborative combat environment as the test scenario, setting a task that requires heterogeneous unmanned aerial vehicles (UAVs) to perform a distributed decision-making and complete cooperation task. Aiming at the problem of the traditional proximal policy optimization (PPO) algorithm’s poor performance in the field of complex multi-agent collaboration scenarios based on the distributed training framework Ray, the Critic network in the PPO algorithm is improved to learn a centralized value function, and the muti-agent proximal policy optimization (MAPPO) algorithm is proposed. At the same time, the inheritance training method based on course learning is adopted to improve the generalization performance of the algorithm. In the experiment, MAPPO can obtain the highest average accumulate reward compared with other algorithms and can complete the task goal with the fewest steps after convergence, which fully demonstrates that the MAPPO algorithm outperforms the state-of-the-art.

KW - PPO

KW - Ray

KW - curriculum learning

KW - deep reinforcement learning

KW - multiple UAVs

UR - http://www.scopus.com/inward/record.url?scp=85133692904&partnerID=8YFLogxK

U2 - 10.3390/drones6070166

DO - 10.3390/drones6070166

M3 - 文章

AN - SCOPUS:85133692904

SN - 2504-446X

VL - 6

JO - Drones

JF - Drones

IS - 7

M1 - 166

ER -

Multiple-UAV Reinforcement Learning Algorithm Based on Improved PPO in Ray Framework

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this