A Non-Fixed-Duration Turn-Based Orbital Game Algorithm Based on History-Aware and Predictive Result Reward Proximal Policy Optimization

  • Xuyang Cao
  • , Xin Ning
  • , Xiaobin Lian
  • , Tongshu Zhang
  • , Suyi Li
  • , Gaopeng Zhang
  • , Zhansheng Chen

Research output: Contribution to journalArticlepeer-review

Abstract

Orbital pursuit–evasion games have attracted increasing attention in recent years. Most existing studies focus on extremely close-range continuous maneuvering scenarios, while medium- and long-range impulsive maneuvering problems are often studied under simplified assumptions, such as synchronous decision-making and maneuver execution, or fixed information measurement delays. In practice, however, satellite maneuver execution typically lags behind decision-making due to control delays and decision validation processes, and the measurement of orbital information is also subject to non-fixed delays caused by environmental and operational factors. To account for this characteristic, this paper considers a non-fixed-duration turn-based orbital pursuit–evasion game setting, in which decision-making and maneuver execution do not occur synchronously and the execution delay is determined by the spacecraft itself. Within this setting, a History-aware and Predictive Result Reward Proximal Policy Optimization (HPRR-PPO) method and an alternating reinforcement learning training scheme are developed. By introducing a predictive result reward and a history-aware terminal reward, the proposed method provides more informative learning signals under non-fixed-duration turn-based interactions. Simulation results demonstrate that the proposed approach achieves better task performance and training efficiency than several baseline methods in the considered orbital pursuit–evasion scenarios.

Original languageEnglish
JournalIEEE Transactions on Aerospace and Electronic Systems
DOIs
StateAccepted/In press - 2026

Keywords

  • History-aware
  • non-fixed-duration turn-based game
  • orbital pursuit-evasion game
  • predictive result reward
  • reinforcement learning

Fingerprint

Dive into the research topics of 'A Non-Fixed-Duration Turn-Based Orbital Game Algorithm Based on History-Aware and Predictive Result Reward Proximal Policy Optimization'. Together they form a unique fingerprint.

Cite this