TY - JOUR
T1 - Air combat joint strategy learning based on a dual-loop framework and hindsight experience replay
AU - Zhang, Yuhe
AU - Yang, Zhen
AU - Zhang, Bao
AU - Wang, Xingyu
AU - Piao, Haiyin
AU - Zhou, Deyun
N1 - Publisher Copyright:
© The Author(s) 2026. Published by Oxford University Press on behalf of the Society for Computational Design and Engineering.
PY - 2026/3/1
Y1 - 2026/3/1
N2 - The research on air combat decision-making methods using artificial intelligence has become a widely studied field. However, due to the complexity of the air combat process and the problem of hybrid action selection (discrete/continuous), traditional methods struggle to simultaneously make decisions on continuous maneuvering and discrete missile launching actions. In addition, designing complex dense reward functions requires difficult-to-obtain aviation expert knowledge, while relying on sparse reward functions makes it difficult to fully explore a large state space. In view of this, we propose a novel algorithm based on a dual-loop framework. The core idea is to separate maneuvering and missile launching decisions into two optimization processes within the training loop, enabling joint decision-making during the search phase while allowing independent optimization during the optimization phase. Besides, hindsight experience replay is adopted to train missile launching decisions. It expands valuable learning samples through a sample relabelling approach. We designed a series of experiments to validate the performance of the proposed method by constructing the opponent’s strategy using a self-play agent and an air combat bot. The performance of the proposed method was validated in a simulation environment, demonstrating that it can generate an air combat joint strategy incorporating both maneuvering and missile launching. In adversarial experiments, the air combat joint strategy we generated achieved a higher win rate than other state-of-the-art air combat methods.
AB - The research on air combat decision-making methods using artificial intelligence has become a widely studied field. However, due to the complexity of the air combat process and the problem of hybrid action selection (discrete/continuous), traditional methods struggle to simultaneously make decisions on continuous maneuvering and discrete missile launching actions. In addition, designing complex dense reward functions requires difficult-to-obtain aviation expert knowledge, while relying on sparse reward functions makes it difficult to fully explore a large state space. In view of this, we propose a novel algorithm based on a dual-loop framework. The core idea is to separate maneuvering and missile launching decisions into two optimization processes within the training loop, enabling joint decision-making during the search phase while allowing independent optimization during the optimization phase. Besides, hindsight experience replay is adopted to train missile launching decisions. It expands valuable learning samples through a sample relabelling approach. We designed a series of experiments to validate the performance of the proposed method by constructing the opponent’s strategy using a self-play agent and an air combat bot. The performance of the proposed method was validated in a simulation environment, demonstrating that it can generate an air combat joint strategy incorporating both maneuvering and missile launching. In adversarial experiments, the air combat joint strategy we generated achieved a higher win rate than other state-of-the-art air combat methods.
KW - air combat
KW - dual-loop framework
KW - hindsight experience replay
KW - hybrid action space
KW - reinforcement learning
UR - https://www.scopus.com/pages/publications/105031608976
U2 - 10.1093/jcde/qwag006
DO - 10.1093/jcde/qwag006
M3 - 文章
AN - SCOPUS:105031608976
SN - 2288-4300
VL - 13
SP - 1
EP - 22
JO - Journal of Computational Design and Engineering
JF - Journal of Computational Design and Engineering
IS - 3
ER -