TY - JOUR
T1 - Hierarchical Reinforcement-Learning for Real-Time Scheduling of Agile Satellites
AU - Ren, Lili
AU - Ning, Xin
AU - Li, Jiayin
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2020
Y1 - 2020
N2 - As is well known, satellite resources are extremely scarce relative to observation demands. Consequently, the Earth observation satellite (EOS) scheduling becomes a remarkable problem which is of significant importance. As an NP-hard problem, It is difficult to get an optimal solution. Furthermore, real-time scheduling makes it even more challenging for researchers. Unfortunately, although fruitful results have been achieved in the category of EOS scheduling, there still exist a number of obvious limitations and drawbacks. For example, the response speed and stability are always limited in the scheduling of urgent tasks that appeared stochastically. To overcome this obstacle, a reinforcement learning algorithm, which is of the ability to make a fast response for the urgent task scheduling, has been proposed in this paper. In order to improve scheduling stability and reduce computational complexity, hierarchical architecture with two layers has been established for the proposed algorithm. In each hierarchical layer, we adopt an online learning paradigm to explore a scheduling strategy at the learning phase. According to the algorithm, the satellite takes a scheduling action when urgent tasks arrive randomly according to a certain strategy. The environment will feedback to the satellite by the corresponding rewards of the actions taken. After multiple feedback, the satellite will select the action that can obtain the greatest benefit. In practical space applications, the satellite can employ the learned strategy to operate the low orbit satellite selection and observation time window (OTW) assignment for urgent tasks in stochastic scenarios, which realize an immediate schedule and maximize scheduling stability at the same time. Finally, a numerical experiment has been performed. The simulation results demonstrate that, compared to the heuristic m-WSITF algorithm, the proposed algorithm possesses significant advantages in effectiveness and efficiency, especially in response speed and stability.
AB - As is well known, satellite resources are extremely scarce relative to observation demands. Consequently, the Earth observation satellite (EOS) scheduling becomes a remarkable problem which is of significant importance. As an NP-hard problem, It is difficult to get an optimal solution. Furthermore, real-time scheduling makes it even more challenging for researchers. Unfortunately, although fruitful results have been achieved in the category of EOS scheduling, there still exist a number of obvious limitations and drawbacks. For example, the response speed and stability are always limited in the scheduling of urgent tasks that appeared stochastically. To overcome this obstacle, a reinforcement learning algorithm, which is of the ability to make a fast response for the urgent task scheduling, has been proposed in this paper. In order to improve scheduling stability and reduce computational complexity, hierarchical architecture with two layers has been established for the proposed algorithm. In each hierarchical layer, we adopt an online learning paradigm to explore a scheduling strategy at the learning phase. According to the algorithm, the satellite takes a scheduling action when urgent tasks arrive randomly according to a certain strategy. The environment will feedback to the satellite by the corresponding rewards of the actions taken. After multiple feedback, the satellite will select the action that can obtain the greatest benefit. In practical space applications, the satellite can employ the learned strategy to operate the low orbit satellite selection and observation time window (OTW) assignment for urgent tasks in stochastic scenarios, which realize an immediate schedule and maximize scheduling stability at the same time. Finally, a numerical experiment has been performed. The simulation results demonstrate that, compared to the heuristic m-WSITF algorithm, the proposed algorithm possesses significant advantages in effectiveness and efficiency, especially in response speed and stability.
KW - Hierarchical architecture
KW - multiple agile satellite
KW - Q learning
KW - real-time scheduling
UR - http://www.scopus.com/inward/record.url?scp=85097936856&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2020.3040748
DO - 10.1109/ACCESS.2020.3040748
M3 - 文章
AN - SCOPUS:85097936856
SN - 2169-3536
VL - 8
SP - 220523
EP - 220532
JO - IEEE Access
JF - IEEE Access
M1 - 9272312
ER -