TY - JOUR
T1 - Continuous-time hierarchical reinforcement learning for satellite pursuit decision
AU - WEI, Linsen
AU - NING, Xin
AU - LIAN, Xiaobin
AU - WANG, Feng
AU - ZHANG, Gaopeng
AU - LIN, Mingpei
N1 - Publisher Copyright:
© 2025 The Authors
PY - 2025/12
Y1 - 2025/12
N2 - The satellite orbital pursuit game focuses on studying spacecraft maneuvering strategies in space. Traditional numerical methods often face real-time inadequacies and adaptability limitations when dealing with highly nonlinear problems. With the advancement of Deep Reinforcement Learning (DRL) technology, continuous-time orbital control capabilities have significantly improved. Despite this, the existing DRL technologies still need adjustments in action delay and discretization structure to better adapt to practical application scenarios. Combining continuous learning and model planning demonstrates the adaptability of these methods in continuous-time decision problems. Additionally, to more effectively handle action delay issues, a new scheduled action execution technique has been developed. This technique optimizes action execution timing through real-time policy adjustments, thus adapting to the dynamic changes in the orbital environment. A Hierarchical Reinforcement Learning (HRL) strategy was also adopted to simplify the decision-making process for long-distance pursuit tasks by setting phased subgoals to gradually approach the target. The effectiveness of the proposed strategy in practical satellite pursuit scenarios has been verified through simulations of two different tasks.
AB - The satellite orbital pursuit game focuses on studying spacecraft maneuvering strategies in space. Traditional numerical methods often face real-time inadequacies and adaptability limitations when dealing with highly nonlinear problems. With the advancement of Deep Reinforcement Learning (DRL) technology, continuous-time orbital control capabilities have significantly improved. Despite this, the existing DRL technologies still need adjustments in action delay and discretization structure to better adapt to practical application scenarios. Combining continuous learning and model planning demonstrates the adaptability of these methods in continuous-time decision problems. Additionally, to more effectively handle action delay issues, a new scheduled action execution technique has been developed. This technique optimizes action execution timing through real-time policy adjustments, thus adapting to the dynamic changes in the orbital environment. A Hierarchical Reinforcement Learning (HRL) strategy was also adopted to simplify the decision-making process for long-distance pursuit tasks by setting phased subgoals to gradually approach the target. The effectiveness of the proposed strategy in practical satellite pursuit scenarios has been verified through simulations of two different tasks.
KW - Continuous-time decision
KW - Hierarchical reinforcement learning
KW - Intelligent decision
KW - Orbital pursuit game
KW - Trajectory planning
UR - https://www.scopus.com/pages/publications/105019515410
U2 - 10.1016/j.cja.2025.103662
DO - 10.1016/j.cja.2025.103662
M3 - 文章
AN - SCOPUS:105019515410
SN - 1000-9361
VL - 38
JO - Chinese Journal of Aeronautics
JF - Chinese Journal of Aeronautics
IS - 12
M1 - 103662
ER -