Game Maneuver Decision-Making for Multi-UAV via PPO-A3C-PER Learning Method

Beibei Qiao; Zhenshuai Jia; Bing Xiao; Hanyu Qian

doi:10.1007/978-981-96-2232-0_8

Game Maneuver Decision-Making for Multi-UAV via PPO-A3C-PER Learning Method

Beibei Qiao, Zhenshuai Jia, Bing Xiao, Hanyu Qian

自动化学院

Northwestern Polytechnical University Xian

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

摘要

Aiming at the problems of long training time, poor flexibility of unmanned aerial vehicle (UAV), and low utilization efficiency of experience pool samples in deep reinforcement learning training for multi-UAV, a multi-UAV maneuver decision-making method based on continuous strategic action sets is proposed. The PPO-A3C-PER algorithm is proposed to solve the problem of long training time of PPO algorithm. Four intelligent maneuvering strategies are proposed to solve the problem of sluggish UAV performance in the multi-UAV game. Design corresponding reward functions for the four strategic behaviors of reconnaissance, pursuit, encirclement, and expulsion., UAVs can complete roundup tasks in different scenarios, and the reinforcement learning algorithm based on the Prioritized Experience Replay and Asynchronous Advantage Actor-Critic method can effectively improve the efficiency of utilizing the samples in the experience pool. Simulation results show that the algorithm has a faster convergence speed than the PPO algorithm in the training phase, the training time is shortened by 39.71% and the targeting rate is improved by 26.32% compared with the PPO algorithm in the same environment.

源语言	英语
主期刊名	Advances in Guidance, Navigation and Control - Proceedings of 2024 International Conference on Guidance, Navigation and Control Volume 9
编辑	Liang Yan, Haibin Duan, Yimin Deng
出版商	Springer Science and Business Media Deutschland GmbH
页	72-81
页数	10
ISBN（印刷版）	9789819622313
DOI	https://doi.org/10.1007/978-981-96-2232-0_8
出版状态	已出版 - 2025
活动	International Conference on Guidance, Navigation and Control, ICGNC 2024 - Changsha, 中国期限: 9 8月 2024 → 11 8月 2024

出版系列

姓名	Lecture Notes in Electrical Engineering
卷	1345 LNEE
ISSN（印刷版）	1876-1100
ISSN（电子版）	1876-1119

会议

会议	International Conference on Guidance, Navigation and Control, ICGNC 2024
国家/地区	中国
市	Changsha
时期	9/08/24 → 11/08/24

访问文件

10.1007/978-981-96-2232-0_8

其它文件与链接

链接到 Scopus 的出版物

引用此

Qiao, B., Jia, Z., Xiao, B., & Qian, H. (2025). Game Maneuver Decision-Making for Multi-UAV via PPO-A3C-PER Learning Method. 在 L. Yan, H. Duan, & Y. Deng (编辑), Advances in Guidance, Navigation and Control - Proceedings of 2024 International Conference on Guidance, Navigation and Control Volume 9 (页码 72-81). (Lecture Notes in Electrical Engineering; 卷 1345 LNEE). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-96-2232-0_8

Qiao, Beibei ; Jia, Zhenshuai ; Xiao, Bing 等. / Game Maneuver Decision-Making for Multi-UAV via PPO-A3C-PER Learning Method. Advances in Guidance, Navigation and Control - Proceedings of 2024 International Conference on Guidance, Navigation and Control Volume 9. 编辑 / Liang Yan ; Haibin Duan ; Yimin Deng. Springer Science and Business Media Deutschland GmbH, 2025. 页码 72-81 (Lecture Notes in Electrical Engineering).

@inproceedings{1b634236f3424c2286a57ba16edc7804,

title = "Game Maneuver Decision-Making for Multi-UAV via PPO-A3C-PER Learning Method",

abstract = "Aiming at the problems of long training time, poor flexibility of unmanned aerial vehicle (UAV), and low utilization efficiency of experience pool samples in deep reinforcement learning training for multi-UAV, a multi-UAV maneuver decision-making method based on continuous strategic action sets is proposed. The PPO-A3C-PER algorithm is proposed to solve the problem of long training time of PPO algorithm. Four intelligent maneuvering strategies are proposed to solve the problem of sluggish UAV performance in the multi-UAV game. Design corresponding reward functions for the four strategic behaviors of reconnaissance, pursuit, encirclement, and expulsion., UAVs can complete roundup tasks in different scenarios, and the reinforcement learning algorithm based on the Prioritized Experience Replay and Asynchronous Advantage Actor-Critic method can effectively improve the efficiency of utilizing the samples in the experience pool. Simulation results show that the algorithm has a faster convergence speed than the PPO algorithm in the training phase, the training time is shortened by 39.71% and the targeting rate is improved by 26.32% compared with the PPO algorithm in the same environment.",

keywords = "A3C, making, maneuver decision, maneuvering strategies, multiple, PER, PPO, reinforcement learning, UAV",

author = "Beibei Qiao and Zhenshuai Jia and Bing Xiao and Hanyu Qian",

note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.; International Conference on Guidance, Navigation and Control, ICGNC 2024 ; Conference date: 09-08-2024 Through 11-08-2024",

year = "2025",

doi = "10.1007/978-981-96-2232-0_8",

language = "英语",

isbn = "9789819622313",

series = "Lecture Notes in Electrical Engineering",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "72--81",

editor = "Liang Yan and Haibin Duan and Yimin Deng",

booktitle = "Advances in Guidance, Navigation and Control - Proceedings of 2024 International Conference on Guidance, Navigation and Control Volume 9",

}

Qiao, B, Jia, Z, Xiao, B & Qian, H 2025, Game Maneuver Decision-Making for Multi-UAV via PPO-A3C-PER Learning Method. 在 L Yan, H Duan & Y Deng (编辑), Advances in Guidance, Navigation and Control - Proceedings of 2024 International Conference on Guidance, Navigation and Control Volume 9. Lecture Notes in Electrical Engineering, 卷 1345 LNEE, Springer Science and Business Media Deutschland GmbH, 页码 72-81, International Conference on Guidance, Navigation and Control, ICGNC 2024, Changsha, 中国, 9/08/24. https://doi.org/10.1007/978-981-96-2232-0_8

Game Maneuver Decision-Making for Multi-UAV via PPO-A3C-PER Learning Method. / Qiao, Beibei; Jia, Zhenshuai; Xiao, Bing 等.
Advances in Guidance, Navigation and Control - Proceedings of 2024 International Conference on Guidance, Navigation and Control Volume 9. 编辑 / Liang Yan; Haibin Duan; Yimin Deng. Springer Science and Business Media Deutschland GmbH, 2025. 页码 72-81 (Lecture Notes in Electrical Engineering; 卷 1345 LNEE).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Game Maneuver Decision-Making for Multi-UAV via PPO-A3C-PER Learning Method

AU - Qiao, Beibei

AU - Jia, Zhenshuai

AU - Xiao, Bing

AU - Qian, Hanyu

N1 - Publisher Copyright: © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.

PY - 2025

Y1 - 2025

N2 - Aiming at the problems of long training time, poor flexibility of unmanned aerial vehicle (UAV), and low utilization efficiency of experience pool samples in deep reinforcement learning training for multi-UAV, a multi-UAV maneuver decision-making method based on continuous strategic action sets is proposed. The PPO-A3C-PER algorithm is proposed to solve the problem of long training time of PPO algorithm. Four intelligent maneuvering strategies are proposed to solve the problem of sluggish UAV performance in the multi-UAV game. Design corresponding reward functions for the four strategic behaviors of reconnaissance, pursuit, encirclement, and expulsion., UAVs can complete roundup tasks in different scenarios, and the reinforcement learning algorithm based on the Prioritized Experience Replay and Asynchronous Advantage Actor-Critic method can effectively improve the efficiency of utilizing the samples in the experience pool. Simulation results show that the algorithm has a faster convergence speed than the PPO algorithm in the training phase, the training time is shortened by 39.71% and the targeting rate is improved by 26.32% compared with the PPO algorithm in the same environment.

AB - Aiming at the problems of long training time, poor flexibility of unmanned aerial vehicle (UAV), and low utilization efficiency of experience pool samples in deep reinforcement learning training for multi-UAV, a multi-UAV maneuver decision-making method based on continuous strategic action sets is proposed. The PPO-A3C-PER algorithm is proposed to solve the problem of long training time of PPO algorithm. Four intelligent maneuvering strategies are proposed to solve the problem of sluggish UAV performance in the multi-UAV game. Design corresponding reward functions for the four strategic behaviors of reconnaissance, pursuit, encirclement, and expulsion., UAVs can complete roundup tasks in different scenarios, and the reinforcement learning algorithm based on the Prioritized Experience Replay and Asynchronous Advantage Actor-Critic method can effectively improve the efficiency of utilizing the samples in the experience pool. Simulation results show that the algorithm has a faster convergence speed than the PPO algorithm in the training phase, the training time is shortened by 39.71% and the targeting rate is improved by 26.32% compared with the PPO algorithm in the same environment.

KW - A3C

KW - making

KW - maneuver decision

KW - maneuvering strategies

KW - multiple

KW - PER

KW - PPO

KW - reinforcement learning

KW - UAV

UR - http://www.scopus.com/inward/record.url?scp=105006464636&partnerID=8YFLogxK

U2 - 10.1007/978-981-96-2232-0_8

DO - 10.1007/978-981-96-2232-0_8

M3 - 会议稿件

AN - SCOPUS:105006464636

SN - 9789819622313

T3 - Lecture Notes in Electrical Engineering

SP - 72

EP - 81

BT - Advances in Guidance, Navigation and Control - Proceedings of 2024 International Conference on Guidance, Navigation and Control Volume 9

A2 - Yan, Liang

A2 - Duan, Haibin

A2 - Deng, Yimin

PB - Springer Science and Business Media Deutschland GmbH

T2 - International Conference on Guidance, Navigation and Control, ICGNC 2024

Y2 - 9 August 2024 through 11 August 2024

ER -

Qiao B, Jia Z, Xiao B, Qian H. Game Maneuver Decision-Making for Multi-UAV via PPO-A3C-PER Learning Method. 在 Yan L, Duan H, Deng Y, 编辑, Advances in Guidance, Navigation and Control - Proceedings of 2024 International Conference on Guidance, Navigation and Control Volume 9. Springer Science and Business Media Deutschland GmbH. 2025. 页码 72-81. (Lecture Notes in Electrical Engineering). doi: 10.1007/978-981-96-2232-0_8

Game Maneuver Decision-Making for Multi-UAV via PPO-A3C-PER Learning Method

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此