Game Maneuver Decision-Making for Multi-UAV via PPO-A3C-PER Learning Method

Beibei Qiao, Zhenshuai Jia, Bing Xiao, Hanyu Qian

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Aiming at the problems of long training time, poor flexibility of unmanned aerial vehicle (UAV), and low utilization efficiency of experience pool samples in deep reinforcement learning training for multi-UAV, a multi-UAV maneuver decision-making method based on continuous strategic action sets is proposed. The PPO-A3C-PER algorithm is proposed to solve the problem of long training time of PPO algorithm. Four intelligent maneuvering strategies are proposed to solve the problem of sluggish UAV performance in the multi-UAV game. Design corresponding reward functions for the four strategic behaviors of reconnaissance, pursuit, encirclement, and expulsion., UAVs can complete roundup tasks in different scenarios, and the reinforcement learning algorithm based on the Prioritized Experience Replay and Asynchronous Advantage Actor-Critic method can effectively improve the efficiency of utilizing the samples in the experience pool. Simulation results show that the algorithm has a faster convergence speed than the PPO algorithm in the training phase, the training time is shortened by 39.71% and the targeting rate is improved by 26.32% compared with the PPO algorithm in the same environment.

源语言英语
主期刊名Advances in Guidance, Navigation and Control - Proceedings of 2024 International Conference on Guidance, Navigation and Control Volume 9
编辑Liang Yan, Haibin Duan, Yimin Deng
出版商Springer Science and Business Media Deutschland GmbH
72-81
页数10
ISBN(印刷版)9789819622313
DOI
出版状态已出版 - 2025
活动International Conference on Guidance, Navigation and Control, ICGNC 2024 - Changsha, 中国
期限: 9 8月 202411 8月 2024

出版系列

姓名Lecture Notes in Electrical Engineering
1345 LNEE
ISSN(印刷版)1876-1100
ISSN(电子版)1876-1119

会议

会议International Conference on Guidance, Navigation and Control, ICGNC 2024
国家/地区中国
Changsha
时期9/08/2411/08/24

指纹

探究 'Game Maneuver Decision-Making for Multi-UAV via PPO-A3C-PER Learning Method' 的科研主题。它们共同构成独一无二的指纹。

引用此