Hierarchical Maneuver Decision Method Based on PG-Option for UAV Pursuit-Evasion Game

Bo Li; Haohui Zhang; Pingkuan He; Geng Wang; Kaiqiang Yue; Evgeny Neretin

doi:10.3390/drones7070449

Hierarchical Maneuver Decision Method Based on PG-Option for UAV Pursuit-Evasion Game

Bo Li, Haohui Zhang, Pingkuan He, Geng Wang, Kaiqiang Yue, Evgeny Neretin

电子信息学院

科研成果: 期刊稿件 › 文章 › 同行评审

6 引用（Scopus）

摘要

Aiming at the autonomous decision-making problem in an Unmanned aerial vehicle (UAV) pursuit-evasion game, this paper proposes a hierarchical maneuver decision method based on the PG-option. Firstly, considering various situations of the relationship of both sides comprehensively, this paper designs four maneuver decision options: advantage game, quick escape, situation change and quick pursuit, and the four options are trained by Soft Actor-Critic (SAC) to obtain the corresponding meta-policy. In addition, to avoid high dimensions in the state space in the hierarchical model, this paper combines the policy gradient (PG) algorithm with the traditional hierarchical reinforcement learning algorithm based on the option. The PG algorithm is used to train the policy selector as the top-level strategy. Finally, to solve the problem of frequent switching of meta-policies, this paper sets the delay selection of the policy selector and introduces the expert experience to design the termination function of the meta-policies, which improves the flexibility of switching policies. Simulation experiments show that the PG-option algorithm has a good effect on UAV pursuit-evasion game and adapts to various environments by switching corresponding meta-policies according to current situation.

源语言	英语
文章编号	449
期刊	Drones
卷	7
期	7
DOI	https://doi.org/10.3390/drones7070449
出版状态	已出版 - 7月 2023

访问文件

10.3390/drones7070449

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{bfd3a6f33bf74a91b4c8ea3d293ef5e2,

title = "Hierarchical Maneuver Decision Method Based on PG-Option for UAV Pursuit-Evasion Game",

abstract = "Aiming at the autonomous decision-making problem in an Unmanned aerial vehicle (UAV) pursuit-evasion game, this paper proposes a hierarchical maneuver decision method based on the PG-option. Firstly, considering various situations of the relationship of both sides comprehensively, this paper designs four maneuver decision options: advantage game, quick escape, situation change and quick pursuit, and the four options are trained by Soft Actor-Critic (SAC) to obtain the corresponding meta-policy. In addition, to avoid high dimensions in the state space in the hierarchical model, this paper combines the policy gradient (PG) algorithm with the traditional hierarchical reinforcement learning algorithm based on the option. The PG algorithm is used to train the policy selector as the top-level strategy. Finally, to solve the problem of frequent switching of meta-policies, this paper sets the delay selection of the policy selector and introduces the expert experience to design the termination function of the meta-policies, which improves the flexibility of switching policies. Simulation experiments show that the PG-option algorithm has a good effect on UAV pursuit-evasion game and adapts to various environments by switching corresponding meta-policies according to current situation.",

keywords = "hierarchical reinforcement learning, meta-policy, policy gradient, UAV pursuit-evasion game",

author = "Bo Li and Haohui Zhang and Pingkuan He and Geng Wang and Kaiqiang Yue and Evgeny Neretin",

note = "Publisher Copyright: {\textcopyright} 2023 by the authors.",

year = "2023",

month = jul,

doi = "10.3390/drones7070449",

language = "英语",

volume = "7",

journal = "Drones",

issn = "2504-446X",

publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",

number = "7",

}

TY - JOUR

T1 - Hierarchical Maneuver Decision Method Based on PG-Option for UAV Pursuit-Evasion Game

AU - Li, Bo

AU - Zhang, Haohui

AU - He, Pingkuan

AU - Wang, Geng

AU - Yue, Kaiqiang

AU - Neretin, Evgeny

PY - 2023/7

Y1 - 2023/7

N2 - Aiming at the autonomous decision-making problem in an Unmanned aerial vehicle (UAV) pursuit-evasion game, this paper proposes a hierarchical maneuver decision method based on the PG-option. Firstly, considering various situations of the relationship of both sides comprehensively, this paper designs four maneuver decision options: advantage game, quick escape, situation change and quick pursuit, and the four options are trained by Soft Actor-Critic (SAC) to obtain the corresponding meta-policy. In addition, to avoid high dimensions in the state space in the hierarchical model, this paper combines the policy gradient (PG) algorithm with the traditional hierarchical reinforcement learning algorithm based on the option. The PG algorithm is used to train the policy selector as the top-level strategy. Finally, to solve the problem of frequent switching of meta-policies, this paper sets the delay selection of the policy selector and introduces the expert experience to design the termination function of the meta-policies, which improves the flexibility of switching policies. Simulation experiments show that the PG-option algorithm has a good effect on UAV pursuit-evasion game and adapts to various environments by switching corresponding meta-policies according to current situation.

AB - Aiming at the autonomous decision-making problem in an Unmanned aerial vehicle (UAV) pursuit-evasion game, this paper proposes a hierarchical maneuver decision method based on the PG-option. Firstly, considering various situations of the relationship of both sides comprehensively, this paper designs four maneuver decision options: advantage game, quick escape, situation change and quick pursuit, and the four options are trained by Soft Actor-Critic (SAC) to obtain the corresponding meta-policy. In addition, to avoid high dimensions in the state space in the hierarchical model, this paper combines the policy gradient (PG) algorithm with the traditional hierarchical reinforcement learning algorithm based on the option. The PG algorithm is used to train the policy selector as the top-level strategy. Finally, to solve the problem of frequent switching of meta-policies, this paper sets the delay selection of the policy selector and introduces the expert experience to design the termination function of the meta-policies, which improves the flexibility of switching policies. Simulation experiments show that the PG-option algorithm has a good effect on UAV pursuit-evasion game and adapts to various environments by switching corresponding meta-policies according to current situation.

KW - hierarchical reinforcement learning

KW - meta-policy

KW - policy gradient

KW - UAV pursuit-evasion game

UR - http://www.scopus.com/inward/record.url?scp=85166347033&partnerID=8YFLogxK

U2 - 10.3390/drones7070449

DO - 10.3390/drones7070449

M3 - 文章

AN - SCOPUS:85166347033

SN - 2504-446X

VL - 7

JO - Drones

JF - Drones

IS - 7

M1 - 449

ER -

Hierarchical Maneuver Decision Method Based on PG-Option for UAV Pursuit-Evasion Game

摘要

访问文件

其它文件与链接

指纹

引用此