Hierarchical Reinforcement-Learning for Real-Time Scheduling of Agile Satellites

Lili Ren, Xin Ning, Jiayin Li

Research output: Contribution to journalArticlepeer-review

17 Scopus citations

Abstract

As is well known, satellite resources are extremely scarce relative to observation demands. Consequently, the Earth observation satellite (EOS) scheduling becomes a remarkable problem which is of significant importance. As an NP-hard problem, It is difficult to get an optimal solution. Furthermore, real-time scheduling makes it even more challenging for researchers. Unfortunately, although fruitful results have been achieved in the category of EOS scheduling, there still exist a number of obvious limitations and drawbacks. For example, the response speed and stability are always limited in the scheduling of urgent tasks that appeared stochastically. To overcome this obstacle, a reinforcement learning algorithm, which is of the ability to make a fast response for the urgent task scheduling, has been proposed in this paper. In order to improve scheduling stability and reduce computational complexity, hierarchical architecture with two layers has been established for the proposed algorithm. In each hierarchical layer, we adopt an online learning paradigm to explore a scheduling strategy at the learning phase. According to the algorithm, the satellite takes a scheduling action when urgent tasks arrive randomly according to a certain strategy. The environment will feedback to the satellite by the corresponding rewards of the actions taken. After multiple feedback, the satellite will select the action that can obtain the greatest benefit. In practical space applications, the satellite can employ the learned strategy to operate the low orbit satellite selection and observation time window (OTW) assignment for urgent tasks in stochastic scenarios, which realize an immediate schedule and maximize scheduling stability at the same time. Finally, a numerical experiment has been performed. The simulation results demonstrate that, compared to the heuristic m-WSITF algorithm, the proposed algorithm possesses significant advantages in effectiveness and efficiency, especially in response speed and stability.

Original languageEnglish
Article number9272312
Pages (from-to)220523-220532
Number of pages10
JournalIEEE Access
Volume8
DOIs
StatePublished - 2020

Keywords

  • Hierarchical architecture
  • multiple agile satellite
  • Q learning
  • real-time scheduling

Fingerprint

Dive into the research topics of 'Hierarchical Reinforcement-Learning for Real-Time Scheduling of Agile Satellites'. Together they form a unique fingerprint.

Cite this