基于深度强化学习与自学习的多无人机近距空战机动策略生成算法

Translated title of the contribution: Maneuvering strategy generation algorithm for multi-UAV in close-range air combat based on deep reinforcement learning and self-play

Wei Ren Kong, De Yun Zhou, Yi Yang Zhao, Wan Sha Yang

Research output: Contribution to journalArticlepeer-review

14 Scopus citations

Abstract

In order to solve the problem of multi-UAV close-range air combat maneuvering decision-making, a multi- UAV close-range air combat maneuvering strategy generation algorithm based on parameter sharing Q network and neural fictitious self-play is proposed. Firstly, a hybrid Markov game model suitable for different UAV formation sizes and a reinforcement learning framework for generating maneuvering decision strategies of multi-UAV are designed-parameter sharing Q network, and the state space is compressed through the autoencoder to improve the efficiency of strategy learning. Then, using the neural fictitious self-play makes the maneuver strategy converge to the Nash equilibrium strategy. Finally, simulation experiments are carried out on the parameter selection of the autoencoder, the training process of the strategy generation algorithm, and the rationality and portability of the maneuver strategy. The simulation results show that the autoencoder is introduced can effectively improve the efficiency of strategy learning, and the multi-UAV short-range air combat maneuver strategy generated by this algorithm is reasonable and good portability.

Translated title of the contributionManeuvering strategy generation algorithm for multi-UAV in close-range air combat based on deep reinforcement learning and self-play
Original languageChinese (Traditional)
Pages (from-to)352-362
Number of pages11
JournalKongzhi Lilun Yu Yingyong/Control Theory and Applications
Volume39
Issue number2
DOIs
StatePublished - Feb 2022

Fingerprint

Dive into the research topics of 'Maneuvering strategy generation algorithm for multi-UAV in close-range air combat based on deep reinforcement learning and self-play'. Together they form a unique fingerprint.

Cite this