Hierarchical Reinforcement-Learning for Real-Time Scheduling of Agile Satellites

Lili Ren; Xin Ning; Jiayin Li

doi:10.1109/ACCESS.2020.3040748

Hierarchical Reinforcement-Learning for Real-Time Scheduling of Agile Satellites

Lili Ren, Xin Ning, Jiayin Li

School of Astronautics

Research output: Contribution to journal › Article › peer-review

17 Scopus citations

Abstract

As is well known, satellite resources are extremely scarce relative to observation demands. Consequently, the Earth observation satellite (EOS) scheduling becomes a remarkable problem which is of significant importance. As an NP-hard problem, It is difficult to get an optimal solution. Furthermore, real-time scheduling makes it even more challenging for researchers. Unfortunately, although fruitful results have been achieved in the category of EOS scheduling, there still exist a number of obvious limitations and drawbacks. For example, the response speed and stability are always limited in the scheduling of urgent tasks that appeared stochastically. To overcome this obstacle, a reinforcement learning algorithm, which is of the ability to make a fast response for the urgent task scheduling, has been proposed in this paper. In order to improve scheduling stability and reduce computational complexity, hierarchical architecture with two layers has been established for the proposed algorithm. In each hierarchical layer, we adopt an online learning paradigm to explore a scheduling strategy at the learning phase. According to the algorithm, the satellite takes a scheduling action when urgent tasks arrive randomly according to a certain strategy. The environment will feedback to the satellite by the corresponding rewards of the actions taken. After multiple feedback, the satellite will select the action that can obtain the greatest benefit. In practical space applications, the satellite can employ the learned strategy to operate the low orbit satellite selection and observation time window (OTW) assignment for urgent tasks in stochastic scenarios, which realize an immediate schedule and maximize scheduling stability at the same time. Finally, a numerical experiment has been performed. The simulation results demonstrate that, compared to the heuristic m-WSITF algorithm, the proposed algorithm possesses significant advantages in effectiveness and efficiency, especially in response speed and stability.

Original language	English
Article number	9272312
Pages (from-to)	220523-220532
Number of pages	10
Journal	IEEE Access
Volume	8
DOIs	https://doi.org/10.1109/ACCESS.2020.3040748
State	Published - 2020

Keywords

Hierarchical architecture
multiple agile satellite
Q learning
real-time scheduling

Access to Document

10.1109/ACCESS.2020.3040748

Cite this

@article{3a9775cb1e95453ca14a81b9e0f7625d,

title = "Hierarchical Reinforcement-Learning for Real-Time Scheduling of Agile Satellites",

abstract = "As is well known, satellite resources are extremely scarce relative to observation demands. Consequently, the Earth observation satellite (EOS) scheduling becomes a remarkable problem which is of significant importance. As an NP-hard problem, It is difficult to get an optimal solution. Furthermore, real-time scheduling makes it even more challenging for researchers. Unfortunately, although fruitful results have been achieved in the category of EOS scheduling, there still exist a number of obvious limitations and drawbacks. For example, the response speed and stability are always limited in the scheduling of urgent tasks that appeared stochastically. To overcome this obstacle, a reinforcement learning algorithm, which is of the ability to make a fast response for the urgent task scheduling, has been proposed in this paper. In order to improve scheduling stability and reduce computational complexity, hierarchical architecture with two layers has been established for the proposed algorithm. In each hierarchical layer, we adopt an online learning paradigm to explore a scheduling strategy at the learning phase. According to the algorithm, the satellite takes a scheduling action when urgent tasks arrive randomly according to a certain strategy. The environment will feedback to the satellite by the corresponding rewards of the actions taken. After multiple feedback, the satellite will select the action that can obtain the greatest benefit. In practical space applications, the satellite can employ the learned strategy to operate the low orbit satellite selection and observation time window (OTW) assignment for urgent tasks in stochastic scenarios, which realize an immediate schedule and maximize scheduling stability at the same time. Finally, a numerical experiment has been performed. The simulation results demonstrate that, compared to the heuristic m-WSITF algorithm, the proposed algorithm possesses significant advantages in effectiveness and efficiency, especially in response speed and stability.",

keywords = "Hierarchical architecture, multiple agile satellite, Q learning, real-time scheduling",

author = "Lili Ren and Xin Ning and Jiayin Li",

note = "Publisher Copyright: {\textcopyright} 2013 IEEE.",

year = "2020",

doi = "10.1109/ACCESS.2020.3040748",

language = "英语",

volume = "8",

pages = "220523--220532",

journal = "IEEE Access",

issn = "2169-3536",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Hierarchical Reinforcement-Learning for Real-Time Scheduling of Agile Satellites

AU - Ren, Lili

AU - Ning, Xin

AU - Li, Jiayin

PY - 2020

Y1 - 2020

N2 - As is well known, satellite resources are extremely scarce relative to observation demands. Consequently, the Earth observation satellite (EOS) scheduling becomes a remarkable problem which is of significant importance. As an NP-hard problem, It is difficult to get an optimal solution. Furthermore, real-time scheduling makes it even more challenging for researchers. Unfortunately, although fruitful results have been achieved in the category of EOS scheduling, there still exist a number of obvious limitations and drawbacks. For example, the response speed and stability are always limited in the scheduling of urgent tasks that appeared stochastically. To overcome this obstacle, a reinforcement learning algorithm, which is of the ability to make a fast response for the urgent task scheduling, has been proposed in this paper. In order to improve scheduling stability and reduce computational complexity, hierarchical architecture with two layers has been established for the proposed algorithm. In each hierarchical layer, we adopt an online learning paradigm to explore a scheduling strategy at the learning phase. According to the algorithm, the satellite takes a scheduling action when urgent tasks arrive randomly according to a certain strategy. The environment will feedback to the satellite by the corresponding rewards of the actions taken. After multiple feedback, the satellite will select the action that can obtain the greatest benefit. In practical space applications, the satellite can employ the learned strategy to operate the low orbit satellite selection and observation time window (OTW) assignment for urgent tasks in stochastic scenarios, which realize an immediate schedule and maximize scheduling stability at the same time. Finally, a numerical experiment has been performed. The simulation results demonstrate that, compared to the heuristic m-WSITF algorithm, the proposed algorithm possesses significant advantages in effectiveness and efficiency, especially in response speed and stability.

AB - As is well known, satellite resources are extremely scarce relative to observation demands. Consequently, the Earth observation satellite (EOS) scheduling becomes a remarkable problem which is of significant importance. As an NP-hard problem, It is difficult to get an optimal solution. Furthermore, real-time scheduling makes it even more challenging for researchers. Unfortunately, although fruitful results have been achieved in the category of EOS scheduling, there still exist a number of obvious limitations and drawbacks. For example, the response speed and stability are always limited in the scheduling of urgent tasks that appeared stochastically. To overcome this obstacle, a reinforcement learning algorithm, which is of the ability to make a fast response for the urgent task scheduling, has been proposed in this paper. In order to improve scheduling stability and reduce computational complexity, hierarchical architecture with two layers has been established for the proposed algorithm. In each hierarchical layer, we adopt an online learning paradigm to explore a scheduling strategy at the learning phase. According to the algorithm, the satellite takes a scheduling action when urgent tasks arrive randomly according to a certain strategy. The environment will feedback to the satellite by the corresponding rewards of the actions taken. After multiple feedback, the satellite will select the action that can obtain the greatest benefit. In practical space applications, the satellite can employ the learned strategy to operate the low orbit satellite selection and observation time window (OTW) assignment for urgent tasks in stochastic scenarios, which realize an immediate schedule and maximize scheduling stability at the same time. Finally, a numerical experiment has been performed. The simulation results demonstrate that, compared to the heuristic m-WSITF algorithm, the proposed algorithm possesses significant advantages in effectiveness and efficiency, especially in response speed and stability.

KW - Hierarchical architecture

KW - multiple agile satellite

KW - Q learning

KW - real-time scheduling

UR - http://www.scopus.com/inward/record.url?scp=85097936856&partnerID=8YFLogxK

U2 - 10.1109/ACCESS.2020.3040748

DO - 10.1109/ACCESS.2020.3040748

M3 - 文章

AN - SCOPUS:85097936856

SN - 2169-3536

VL - 8

SP - 220523

EP - 220532

JO - IEEE Access

JF - IEEE Access

M1 - 9272312

ER -

Hierarchical Reinforcement-Learning for Real-Time Scheduling of Agile Satellites

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this