跳到主要导航 跳到搜索 跳到主要内容

A Nonfixed-Duration Turn-Based Orbital Game Algorithm Based on History-Aware and Predictive Result Reward Proximal Policy Optimization

  • Xuyang Cao
  • , Xin Ning
  • , Xiaobin Lian
  • , Tongshu Zhang
  • , Suyi Liu
  • , Gaopeng Zhang
  • , Zhansheng Chen
  • Northwestern Polytechnical University Xian
  • CAS - Xi'an Institute of Optics and Precision Mechanics
  • China Aerospace Science and Technology Corporation

科研成果: 期刊稿件文章同行评审

摘要

Orbital pursuit-evasion games have attracted increasing attention in recent years. Most existing studies focus on extremely close-range continuous maneuvering scenarios, while medium- and long-range impulsive maneuvering problems are often studied under simplified assumptions, such as synchronous decision making and maneuver execution, or fixed information measurement delays. In practice, however, satellite maneuver execution typically lags behind decision making due to control delays and decision validation processes, and the measurement of orbital information is also subject to nonfixed delays caused by environmental and operational factors. To account for this characteristic, this article considers a nonfixed-duration turn-based orbital pursuit-evasion game setting, in which decision-making and maneuver execution do not occur synchronously and the execution delay is determined by the spacecraft itself. Within this setting, a History-aware and Predictive Result Reward Proximal Policy Optimization method and an alternating reinforcement learning training scheme are developed. By introducing a predictive result reward and a history-aware terminal reward, the proposed method provides more informative learning signals under nonfixed-duration turn-based interactions. Simulation results demonstrate that the proposed approach achieves better task performance and training efficiency than several baseline methods in the considered orbital pursuit-evasion scenarios.

源语言英语
页(从-至)5213-5229
页数17
期刊IEEE Transactions on Aerospace and Electronic Systems
62
DOI
出版状态已出版 - 2026

指纹

探究 'A Nonfixed-Duration Turn-Based Orbital Game Algorithm Based on History-Aware and Predictive Result Reward Proximal Policy Optimization' 的科研主题。它们共同构成独一无二的指纹。

引用此