TY - JOUR
T1 - A Nonfixed-Duration Turn-Based Orbital Game Algorithm Based on History-Aware and Predictive Result Reward Proximal Policy Optimization
AU - Cao, Xuyang
AU - Ning, Xin
AU - Lian, Xiaobin
AU - Zhang, Tongshu
AU - Liu, Suyi
AU - Zhang, Gaopeng
AU - Chen, Zhansheng
N1 - Publisher Copyright:
© 1965-2011 IEEE.
PY - 2026
Y1 - 2026
N2 - Orbital pursuit-evasion games have attracted increasing attention in recent years. Most existing studies focus on extremely close-range continuous maneuvering scenarios, while medium- and long-range impulsive maneuvering problems are often studied under simplified assumptions, such as synchronous decision making and maneuver execution, or fixed information measurement delays. In practice, however, satellite maneuver execution typically lags behind decision making due to control delays and decision validation processes, and the measurement of orbital information is also subject to nonfixed delays caused by environmental and operational factors. To account for this characteristic, this article considers a nonfixed-duration turn-based orbital pursuit-evasion game setting, in which decision-making and maneuver execution do not occur synchronously and the execution delay is determined by the spacecraft itself. Within this setting, a History-aware and Predictive Result Reward Proximal Policy Optimization method and an alternating reinforcement learning training scheme are developed. By introducing a predictive result reward and a history-aware terminal reward, the proposed method provides more informative learning signals under nonfixed-duration turn-based interactions. Simulation results demonstrate that the proposed approach achieves better task performance and training efficiency than several baseline methods in the considered orbital pursuit-evasion scenarios.
AB - Orbital pursuit-evasion games have attracted increasing attention in recent years. Most existing studies focus on extremely close-range continuous maneuvering scenarios, while medium- and long-range impulsive maneuvering problems are often studied under simplified assumptions, such as synchronous decision making and maneuver execution, or fixed information measurement delays. In practice, however, satellite maneuver execution typically lags behind decision making due to control delays and decision validation processes, and the measurement of orbital information is also subject to nonfixed delays caused by environmental and operational factors. To account for this characteristic, this article considers a nonfixed-duration turn-based orbital pursuit-evasion game setting, in which decision-making and maneuver execution do not occur synchronously and the execution delay is determined by the spacecraft itself. Within this setting, a History-aware and Predictive Result Reward Proximal Policy Optimization method and an alternating reinforcement learning training scheme are developed. By introducing a predictive result reward and a history-aware terminal reward, the proposed method provides more informative learning signals under nonfixed-duration turn-based interactions. Simulation results demonstrate that the proposed approach achieves better task performance and training efficiency than several baseline methods in the considered orbital pursuit-evasion scenarios.
KW - History aware
KW - nonfixed-duration turn-based game
KW - orbital pursuit-evasion game
KW - predictive result reward
KW - reinforcement learning
UR - https://www.scopus.com/pages/publications/105028603484
U2 - 10.1109/TAES.2026.3657323
DO - 10.1109/TAES.2026.3657323
M3 - 文章
AN - SCOPUS:105028603484
SN - 0018-9251
VL - 62
SP - 5213
EP - 5229
JO - IEEE Transactions on Aerospace and Electronic Systems
JF - IEEE Transactions on Aerospace and Electronic Systems
ER -