跳到主要导航 跳到搜索 跳到主要内容

A preference-based Reinforcement Learning method of maneuver decision-making in air combat

  • National Key Laboratory of Aircraft Configuration Design
  • Northwestern Polytechnical University Xian

科研成果: 期刊稿件文章同行评审

摘要

Reinforcement Learning (RL) techniques have advanced significantly in addressing maneuver decision-making problems in air combat. However, existing RL methods with fixed reward structures face limitations of inconsistent preferences between dense and sparse rewards, hindering the efficient learning of the optimal policy. To overcome this challenge, a preference-based reinforcement learning method of maneuver decision-making in air combat is proposed. First, a Preference-Based Adaptive Reward Weights Generation (PBARWG) model is proposed to generate the weights of the dense rewards adaptively. This model formulates preference relationships by comparing the discounted cumulative sparse rewards across different processes. Concurrently, the preferences between the dense and sparse rewards are aligned by minimizing the preference loss function. Then, in response to the temporal features of air combat, an improved Multi-Agent Proximal Policy Optimization (MAPPO) model with the Gated Recurrent Unit (GRU) and residual structure, designated as MAPPO-GRU-PBARWG, is proposed to obtain the effective maneuver policy. Finally, results from the comparative experiments have demonstrated that the proposed method outperforms other methods, achieving a win rate of more than 50%, an extremely low crash rate, and a higher average reward level. This study highlights the effectiveness of adaptive weight generation and efficient temporal feature extraction techniques in producing air combat strategies, and provides a viable approach for autonomous maneuver decision-making in short-range air combat scenarios.

源语言英语
文章编号113761
期刊Engineering Applications of Artificial Intelligence
167
DOI
出版状态已出版 - 1 3月 2026

指纹

探究 'A preference-based Reinforcement Learning method of maneuver decision-making in air combat' 的科研主题。它们共同构成独一无二的指纹。

引用此