A Reinforcement Learning Scheme for Active Multi-Debris Removal Mission Planning with Modified Upper Confidence Bound Tree Search

Jianan Yang; Xiaolei Hou; Yu Hen Hu; Yong Liu; Quan Pan

doi:10.1109/ACCESS.2020.3001311

A Reinforcement Learning Scheme for Active Multi-Debris Removal Mission Planning with Modified Upper Confidence Bound Tree Search

Jianan Yang, Xiaolei Hou, Yu Hen Hu, Yong Liu, Quan Pan

自动化学院

科研成果: 期刊稿件 › 文章 › 同行评审

12 引用（Scopus）

摘要

The increasing number of space debris is a critical impact on space environment. Active multi-debris removal (ADR) mission planning technique with maximal reward objective is getting more attention. As the goal of Reinforcement Learning (RL) is in accordance with maximal-reward optimization model of ADR, the planning will be more efficient with the advanced RL scheme and RL algorithm. In this paper, first, an RL formulation is presented for the ADR mission planning problem. All the basic components of maximal-reward optimization model are recast in RL scheme. Second, a modified Upper Confidence bound Tree (UCT) search algorithm for the ADR planning task is developed, which both leverages the neural-network-assisted selection and expansion procedures to facilitate exploration and incorporates roll-out simulation in the backup procedure to achieve robust value estimation. This algorithm fits the RL scheme of ADR mission planning and better balances the exploration and exploitation. Experimental comparison using three subsets of Iridium 33 debris cloud data reveals a better performance of this modified UCT over previously reported results and close UCT variants.

源语言	英语
文章编号	9113461
页（从-至）	108461-108473
页数	13
期刊	IEEE Access
卷	8
DOI	https://doi.org/10.1109/ACCESS.2020.3001311
出版状态	已出版 - 2020

访问文件

10.1109/ACCESS.2020.3001311

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{cc2f83a57a384e5aa7ed9b0e4af5b445,

title = "A Reinforcement Learning Scheme for Active Multi-Debris Removal Mission Planning with Modified Upper Confidence Bound Tree Search",

abstract = "The increasing number of space debris is a critical impact on space environment. Active multi-debris removal (ADR) mission planning technique with maximal reward objective is getting more attention. As the goal of Reinforcement Learning (RL) is in accordance with maximal-reward optimization model of ADR, the planning will be more efficient with the advanced RL scheme and RL algorithm. In this paper, first, an RL formulation is presented for the ADR mission planning problem. All the basic components of maximal-reward optimization model are recast in RL scheme. Second, a modified Upper Confidence bound Tree (UCT) search algorithm for the ADR planning task is developed, which both leverages the neural-network-assisted selection and expansion procedures to facilitate exploration and incorporates roll-out simulation in the backup procedure to achieve robust value estimation. This algorithm fits the RL scheme of ADR mission planning and better balances the exploration and exploitation. Experimental comparison using three subsets of Iridium 33 debris cloud data reveals a better performance of this modified UCT over previously reported results and close UCT variants.",

keywords = "Monte Carlo tree search, multi-debris active removal, reinforcement learning, space mission planning",

author = "Jianan Yang and Xiaolei Hou and Hu, {Yu Hen} and Yong Liu and Quan Pan",

note = "Publisher Copyright: {\textcopyright} 2013 IEEE.",

year = "2020",

doi = "10.1109/ACCESS.2020.3001311",

language = "英语",

volume = "8",

pages = "108461--108473",

journal = "IEEE Access",

issn = "2169-3536",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - A Reinforcement Learning Scheme for Active Multi-Debris Removal Mission Planning with Modified Upper Confidence Bound Tree Search

AU - Yang, Jianan

AU - Hou, Xiaolei

AU - Hu, Yu Hen

AU - Liu, Yong

AU - Pan, Quan

PY - 2020

Y1 - 2020

N2 - The increasing number of space debris is a critical impact on space environment. Active multi-debris removal (ADR) mission planning technique with maximal reward objective is getting more attention. As the goal of Reinforcement Learning (RL) is in accordance with maximal-reward optimization model of ADR, the planning will be more efficient with the advanced RL scheme and RL algorithm. In this paper, first, an RL formulation is presented for the ADR mission planning problem. All the basic components of maximal-reward optimization model are recast in RL scheme. Second, a modified Upper Confidence bound Tree (UCT) search algorithm for the ADR planning task is developed, which both leverages the neural-network-assisted selection and expansion procedures to facilitate exploration and incorporates roll-out simulation in the backup procedure to achieve robust value estimation. This algorithm fits the RL scheme of ADR mission planning and better balances the exploration and exploitation. Experimental comparison using three subsets of Iridium 33 debris cloud data reveals a better performance of this modified UCT over previously reported results and close UCT variants.

AB - The increasing number of space debris is a critical impact on space environment. Active multi-debris removal (ADR) mission planning technique with maximal reward objective is getting more attention. As the goal of Reinforcement Learning (RL) is in accordance with maximal-reward optimization model of ADR, the planning will be more efficient with the advanced RL scheme and RL algorithm. In this paper, first, an RL formulation is presented for the ADR mission planning problem. All the basic components of maximal-reward optimization model are recast in RL scheme. Second, a modified Upper Confidence bound Tree (UCT) search algorithm for the ADR planning task is developed, which both leverages the neural-network-assisted selection and expansion procedures to facilitate exploration and incorporates roll-out simulation in the backup procedure to achieve robust value estimation. This algorithm fits the RL scheme of ADR mission planning and better balances the exploration and exploitation. Experimental comparison using three subsets of Iridium 33 debris cloud data reveals a better performance of this modified UCT over previously reported results and close UCT variants.

KW - Monte Carlo tree search

KW - multi-debris active removal

KW - reinforcement learning

KW - space mission planning

UR - http://www.scopus.com/inward/record.url?scp=85086992818&partnerID=8YFLogxK

U2 - 10.1109/ACCESS.2020.3001311

DO - 10.1109/ACCESS.2020.3001311

M3 - 文章

AN - SCOPUS:85086992818

SN - 2169-3536

VL - 8

SP - 108461

EP - 108473

JO - IEEE Access

JF - IEEE Access

M1 - 9113461

ER -

A Reinforcement Learning Scheme for Active Multi-Debris Removal Mission Planning with Modified Upper Confidence Bound Tree Search

摘要

访问文件

其它文件与链接

指纹

引用此