Game Maneuver Decision-Making for Multi-UAV via PPO-A3C-PER Learning Method

Beibei Qiao; Zhenshuai Jia; Bing Xiao; Hanyu Qian

doi:10.1007/978-981-96-2232-0_8

Game Maneuver Decision-Making for Multi-UAV via PPO-A3C-PER Learning Method

Beibei Qiao, Zhenshuai Jia, Bing Xiao, Hanyu Qian

School of Automation

Northwestern Polytechnical University Xian

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

Aiming at the problems of long training time, poor flexibility of unmanned aerial vehicle (UAV), and low utilization efficiency of experience pool samples in deep reinforcement learning training for multi-UAV, a multi-UAV maneuver decision-making method based on continuous strategic action sets is proposed. The PPO-A3C-PER algorithm is proposed to solve the problem of long training time of PPO algorithm. Four intelligent maneuvering strategies are proposed to solve the problem of sluggish UAV performance in the multi-UAV game. Design corresponding reward functions for the four strategic behaviors of reconnaissance, pursuit, encirclement, and expulsion., UAVs can complete roundup tasks in different scenarios, and the reinforcement learning algorithm based on the Prioritized Experience Replay and Asynchronous Advantage Actor-Critic method can effectively improve the efficiency of utilizing the samples in the experience pool. Simulation results show that the algorithm has a faster convergence speed than the PPO algorithm in the training phase, the training time is shortened by 39.71% and the targeting rate is improved by 26.32% compared with the PPO algorithm in the same environment.

Original language	English
Title of host publication	Advances in Guidance, Navigation and Control - Proceedings of 2024 International Conference on Guidance, Navigation and Control Volume 9
Editors	Liang Yan, Haibin Duan, Yimin Deng
Publisher	Springer Science and Business Media Deutschland GmbH
Pages	72-81
Number of pages	10
ISBN (Print)	9789819622313
DOIs	https://doi.org/10.1007/978-981-96-2232-0_8
State	Published - 2025
Event	International Conference on Guidance, Navigation and Control, ICGNC 2024 - Changsha, China Duration: 9 Aug 2024 → 11 Aug 2024

Publication series

Name	Lecture Notes in Electrical Engineering
Volume	1345 LNEE
ISSN (Print)	1876-1100
ISSN (Electronic)	1876-1119

Conference

Conference	International Conference on Guidance, Navigation and Control, ICGNC 2024
Country/Territory	China
City	Changsha
Period	9/08/24 → 11/08/24

Keywords

A3C
making
maneuver decision
maneuvering strategies
multiple
PER
PPO
reinforcement learning
UAV

Access to Document

10.1007/978-981-96-2232-0_8

Cite this

Qiao, B., Jia, Z., Xiao, B., & Qian, H. (2025). Game Maneuver Decision-Making for Multi-UAV via PPO-A3C-PER Learning Method. In L. Yan, H. Duan, & Y. Deng (Eds.), Advances in Guidance, Navigation and Control - Proceedings of 2024 International Conference on Guidance, Navigation and Control Volume 9 (pp. 72-81). (Lecture Notes in Electrical Engineering; Vol. 1345 LNEE). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-96-2232-0_8

Qiao, Beibei ; Jia, Zhenshuai ; Xiao, Bing et al. / Game Maneuver Decision-Making for Multi-UAV via PPO-A3C-PER Learning Method. Advances in Guidance, Navigation and Control - Proceedings of 2024 International Conference on Guidance, Navigation and Control Volume 9. editor / Liang Yan ; Haibin Duan ; Yimin Deng. Springer Science and Business Media Deutschland GmbH, 2025. pp. 72-81 (Lecture Notes in Electrical Engineering).

@inproceedings{1b634236f3424c2286a57ba16edc7804,

title = "Game Maneuver Decision-Making for Multi-UAV via PPO-A3C-PER Learning Method",

abstract = "Aiming at the problems of long training time, poor flexibility of unmanned aerial vehicle (UAV), and low utilization efficiency of experience pool samples in deep reinforcement learning training for multi-UAV, a multi-UAV maneuver decision-making method based on continuous strategic action sets is proposed. The PPO-A3C-PER algorithm is proposed to solve the problem of long training time of PPO algorithm. Four intelligent maneuvering strategies are proposed to solve the problem of sluggish UAV performance in the multi-UAV game. Design corresponding reward functions for the four strategic behaviors of reconnaissance, pursuit, encirclement, and expulsion., UAVs can complete roundup tasks in different scenarios, and the reinforcement learning algorithm based on the Prioritized Experience Replay and Asynchronous Advantage Actor-Critic method can effectively improve the efficiency of utilizing the samples in the experience pool. Simulation results show that the algorithm has a faster convergence speed than the PPO algorithm in the training phase, the training time is shortened by 39.71% and the targeting rate is improved by 26.32% compared with the PPO algorithm in the same environment.",

keywords = "A3C, making, maneuver decision, maneuvering strategies, multiple, PER, PPO, reinforcement learning, UAV",

author = "Beibei Qiao and Zhenshuai Jia and Bing Xiao and Hanyu Qian",

note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.; International Conference on Guidance, Navigation and Control, ICGNC 2024 ; Conference date: 09-08-2024 Through 11-08-2024",

year = "2025",

doi = "10.1007/978-981-96-2232-0_8",

language = "英语",

isbn = "9789819622313",

series = "Lecture Notes in Electrical Engineering",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "72--81",

editor = "Liang Yan and Haibin Duan and Yimin Deng",

booktitle = "Advances in Guidance, Navigation and Control - Proceedings of 2024 International Conference on Guidance, Navigation and Control Volume 9",

}

Qiao, B, Jia, Z, Xiao, B & Qian, H 2025, Game Maneuver Decision-Making for Multi-UAV via PPO-A3C-PER Learning Method. in L Yan, H Duan & Y Deng (eds), Advances in Guidance, Navigation and Control - Proceedings of 2024 International Conference on Guidance, Navigation and Control Volume 9. Lecture Notes in Electrical Engineering, vol. 1345 LNEE, Springer Science and Business Media Deutschland GmbH, pp. 72-81, International Conference on Guidance, Navigation and Control, ICGNC 2024, Changsha, China, 9/08/24. https://doi.org/10.1007/978-981-96-2232-0_8

Game Maneuver Decision-Making for Multi-UAV via PPO-A3C-PER Learning Method. / Qiao, Beibei; Jia, Zhenshuai; Xiao, Bing et al.
Advances in Guidance, Navigation and Control - Proceedings of 2024 International Conference on Guidance, Navigation and Control Volume 9. ed. / Liang Yan; Haibin Duan; Yimin Deng. Springer Science and Business Media Deutschland GmbH, 2025. p. 72-81 (Lecture Notes in Electrical Engineering; Vol. 1345 LNEE).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Game Maneuver Decision-Making for Multi-UAV via PPO-A3C-PER Learning Method

AU - Qiao, Beibei

AU - Jia, Zhenshuai

AU - Xiao, Bing

AU - Qian, Hanyu

N1 - Publisher Copyright: © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.

PY - 2025

Y1 - 2025

N2 - Aiming at the problems of long training time, poor flexibility of unmanned aerial vehicle (UAV), and low utilization efficiency of experience pool samples in deep reinforcement learning training for multi-UAV, a multi-UAV maneuver decision-making method based on continuous strategic action sets is proposed. The PPO-A3C-PER algorithm is proposed to solve the problem of long training time of PPO algorithm. Four intelligent maneuvering strategies are proposed to solve the problem of sluggish UAV performance in the multi-UAV game. Design corresponding reward functions for the four strategic behaviors of reconnaissance, pursuit, encirclement, and expulsion., UAVs can complete roundup tasks in different scenarios, and the reinforcement learning algorithm based on the Prioritized Experience Replay and Asynchronous Advantage Actor-Critic method can effectively improve the efficiency of utilizing the samples in the experience pool. Simulation results show that the algorithm has a faster convergence speed than the PPO algorithm in the training phase, the training time is shortened by 39.71% and the targeting rate is improved by 26.32% compared with the PPO algorithm in the same environment.

AB - Aiming at the problems of long training time, poor flexibility of unmanned aerial vehicle (UAV), and low utilization efficiency of experience pool samples in deep reinforcement learning training for multi-UAV, a multi-UAV maneuver decision-making method based on continuous strategic action sets is proposed. The PPO-A3C-PER algorithm is proposed to solve the problem of long training time of PPO algorithm. Four intelligent maneuvering strategies are proposed to solve the problem of sluggish UAV performance in the multi-UAV game. Design corresponding reward functions for the four strategic behaviors of reconnaissance, pursuit, encirclement, and expulsion., UAVs can complete roundup tasks in different scenarios, and the reinforcement learning algorithm based on the Prioritized Experience Replay and Asynchronous Advantage Actor-Critic method can effectively improve the efficiency of utilizing the samples in the experience pool. Simulation results show that the algorithm has a faster convergence speed than the PPO algorithm in the training phase, the training time is shortened by 39.71% and the targeting rate is improved by 26.32% compared with the PPO algorithm in the same environment.

KW - A3C

KW - making

KW - maneuver decision

KW - maneuvering strategies

KW - multiple

KW - PER

KW - PPO

KW - reinforcement learning

KW - UAV

UR - http://www.scopus.com/inward/record.url?scp=105006464636&partnerID=8YFLogxK

U2 - 10.1007/978-981-96-2232-0_8

DO - 10.1007/978-981-96-2232-0_8

M3 - 会议稿件

AN - SCOPUS:105006464636

SN - 9789819622313

T3 - Lecture Notes in Electrical Engineering

SP - 72

EP - 81

BT - Advances in Guidance, Navigation and Control - Proceedings of 2024 International Conference on Guidance, Navigation and Control Volume 9

A2 - Yan, Liang

A2 - Duan, Haibin

A2 - Deng, Yimin

PB - Springer Science and Business Media Deutschland GmbH

T2 - International Conference on Guidance, Navigation and Control, ICGNC 2024

Y2 - 9 August 2024 through 11 August 2024

ER -

Qiao B, Jia Z, Xiao B, Qian H. Game Maneuver Decision-Making for Multi-UAV via PPO-A3C-PER Learning Method. In Yan L, Duan H, Deng Y, editors, Advances in Guidance, Navigation and Control - Proceedings of 2024 International Conference on Guidance, Navigation and Control Volume 9. Springer Science and Business Media Deutschland GmbH. 2025. p. 72-81. (Lecture Notes in Electrical Engineering). doi: 10.1007/978-981-96-2232-0_8

Game Maneuver Decision-Making for Multi-UAV via PPO-A3C-PER Learning Method

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this