A preference-based Reinforcement Learning method of maneuver decision-making in air combat

Research output: Contribution to journalArticlepeer-review

Abstract

Reinforcement Learning (RL) techniques have advanced significantly in addressing maneuver decision-making problems in air combat. However, existing RL methods with fixed reward structures face limitations of inconsistent preferences between dense and sparse rewards, hindering the efficient learning of the optimal policy. To overcome this challenge, a preference-based reinforcement learning method of maneuver decision-making in air combat is proposed. First, a Preference-Based Adaptive Reward Weights Generation (PBARWG) model is proposed to generate the weights of the dense rewards adaptively. This model formulates preference relationships by comparing the discounted cumulative sparse rewards across different processes. Concurrently, the preferences between the dense and sparse rewards are aligned by minimizing the preference loss function. Then, in response to the temporal features of air combat, an improved Multi-Agent Proximal Policy Optimization (MAPPO) model with the Gated Recurrent Unit (GRU) and residual structure, designated as MAPPO-GRU-PBARWG, is proposed to obtain the effective maneuver policy. Finally, results from the comparative experiments have demonstrated that the proposed method outperforms other methods, achieving a win rate of more than 50%, an extremely low crash rate, and a higher average reward level. This study highlights the effectiveness of adaptive weight generation and efficient temporal feature extraction techniques in producing air combat strategies, and provides a viable approach for autonomous maneuver decision-making in short-range air combat scenarios.

Original languageEnglish
Article number113761
JournalEngineering Applications of Artificial Intelligence
Volume167
DOIs
StatePublished - 1 Mar 2026

Keywords

  • Air combat
  • Maneuver decision-making
  • Preference learning
  • Reinforcement learning

Fingerprint

Dive into the research topics of 'A preference-based Reinforcement Learning method of maneuver decision-making in air combat'. Together they form a unique fingerprint.

Cite this