Skip to main navigation Skip to search Skip to main content

A Nonfixed-Duration Turn-Based Orbital Game Algorithm Based on History-Aware and Predictive Result Reward Proximal Policy Optimization

  • Xuyang Cao
  • , Xin Ning
  • , Xiaobin Lian
  • , Tongshu Zhang
  • , Suyi Liu
  • , Gaopeng Zhang
  • , Zhansheng Chen
  • Northwestern Polytechnical University Xian
  • CAS - Xi'an Institute of Optics and Precision Mechanics
  • China Aerospace Science and Technology Corporation

Research output: Contribution to journalArticlepeer-review

Abstract

Orbital pursuit-evasion games have attracted increasing attention in recent years. Most existing studies focus on extremely close-range continuous maneuvering scenarios, while medium- and long-range impulsive maneuvering problems are often studied under simplified assumptions, such as synchronous decision making and maneuver execution, or fixed information measurement delays. In practice, however, satellite maneuver execution typically lags behind decision making due to control delays and decision validation processes, and the measurement of orbital information is also subject to nonfixed delays caused by environmental and operational factors. To account for this characteristic, this article considers a nonfixed-duration turn-based orbital pursuit-evasion game setting, in which decision-making and maneuver execution do not occur synchronously and the execution delay is determined by the spacecraft itself. Within this setting, a History-aware and Predictive Result Reward Proximal Policy Optimization method and an alternating reinforcement learning training scheme are developed. By introducing a predictive result reward and a history-aware terminal reward, the proposed method provides more informative learning signals under nonfixed-duration turn-based interactions. Simulation results demonstrate that the proposed approach achieves better task performance and training efficiency than several baseline methods in the considered orbital pursuit-evasion scenarios.

Original languageEnglish
Pages (from-to)5213-5229
Number of pages17
JournalIEEE Transactions on Aerospace and Electronic Systems
Volume62
DOIs
StatePublished - 2026

Keywords

  • History aware
  • nonfixed-duration turn-based game
  • orbital pursuit-evasion game
  • predictive result reward
  • reinforcement learning

Fingerprint

Dive into the research topics of 'A Nonfixed-Duration Turn-Based Orbital Game Algorithm Based on History-Aware and Predictive Result Reward Proximal Policy Optimization'. Together they form a unique fingerprint.

Cite this