Hierarchical Reinforcement Learning for UAV-PE Game With Alternative Delay Update Method

Xiao Ma; Yuan Yuan; Lei Guo

doi:10.1109/TNNLS.2024.3362969

Hierarchical Reinforcement Learning for UAV-PE Game With Alternative Delay Update Method

Xiao Ma, Yuan Yuan, Lei Guo

航天学院

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

This article proposes a novel hierarchical reinforcement learning (HRL) algorithm for unmanned aerial vehicle pursuit-evasion (UAV-PE) game systems with an alternative delay update (ADU) method. In the proposed algorithm, the approximate solutions of the UAV-PE game problem are derived from a hierarchical learning process, which relies on a zero-sum game process of kinematics and a corresponding optimal process of dynamics. In this case, deep neural networks (NNs) are used to approximate the policy and value functions of UAV-PE game systems in kinematics and dynamics level. Furthermore, the ADU method is adopted to improve the training efficiency of deep NN by fixing one player of the UAV-PE game systems to form a stable environment. The goal of this article is to develop an HRL algorithm with an ADU method for obtaining approximate Nash equilibrium (NE) solutions of the considered UAV-PE game systems which are subjected to the coupling of kinematics and dynamics. Subsequently, sufficient conditions are provided for analyzing the convergence and optimality of the proposed HRL algorithm. Moreover, the inequalities of overload are obtained to guarantee that the state of dynamics tracks with the control input of kinematics in UAV-PE game systems. Finally, simulation examples are provided to demonstrate the feasibility and usefulness of the proposed HRL algorithm and ADU method.

源语言	英语
页（从-至）	1-13
页数	13
期刊	IEEE Transactions on Neural Networks and Learning Systems
DOI	https://doi.org/10.1109/TNNLS.2024.3362969
出版状态	已接受/待刊 - 2024

访问文件

10.1109/TNNLS.2024.3362969

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{7b144cffb3c748268931152a8bc4c6f7,

title = "Hierarchical Reinforcement Learning for UAV-PE Game With Alternative Delay Update Method",

abstract = "This article proposes a novel hierarchical reinforcement learning (HRL) algorithm for unmanned aerial vehicle pursuit-evasion (UAV-PE) game systems with an alternative delay update (ADU) method. In the proposed algorithm, the approximate solutions of the UAV-PE game problem are derived from a hierarchical learning process, which relies on a zero-sum game process of kinematics and a corresponding optimal process of dynamics. In this case, deep neural networks (NNs) are used to approximate the policy and value functions of UAV-PE game systems in kinematics and dynamics level. Furthermore, the ADU method is adopted to improve the training efficiency of deep NN by fixing one player of the UAV-PE game systems to form a stable environment. The goal of this article is to develop an HRL algorithm with an ADU method for obtaining approximate Nash equilibrium (NE) solutions of the considered UAV-PE game systems which are subjected to the coupling of kinematics and dynamics. Subsequently, sufficient conditions are provided for analyzing the convergence and optimality of the proposed HRL algorithm. Moreover, the inequalities of overload are obtained to guarantee that the state of dynamics tracks with the control input of kinematics in UAV-PE game systems. Finally, simulation examples are provided to demonstrate the feasibility and usefulness of the proposed HRL algorithm and ADU method.",

keywords = "Alternative delay update (ADU), Approximation algorithms, Artificial neural networks, Autonomous aerial vehicles, Games, Heuristic algorithms, Kinematics, Training, hierarchical reinforcement learning (HRL), neural networks (NNs), unmanned aerial vehicle pursuit-evasion (UAV-PE) game",

author = "Xiao Ma and Yuan Yuan and Lei Guo",

note = "Publisher Copyright: IEEE",

year = "2024",

doi = "10.1109/TNNLS.2024.3362969",

language = "英语",

pages = "1--13",

journal = "IEEE Transactions on Neural Networks and Learning Systems",

issn = "2162-237X",

publisher = "IEEE Computational Intelligence Society",

}

TY - JOUR

T1 - Hierarchical Reinforcement Learning for UAV-PE Game With Alternative Delay Update Method

AU - Ma, Xiao

AU - Yuan, Yuan

AU - Guo, Lei

N1 - Publisher Copyright: IEEE

PY - 2024

Y1 - 2024

N2 - This article proposes a novel hierarchical reinforcement learning (HRL) algorithm for unmanned aerial vehicle pursuit-evasion (UAV-PE) game systems with an alternative delay update (ADU) method. In the proposed algorithm, the approximate solutions of the UAV-PE game problem are derived from a hierarchical learning process, which relies on a zero-sum game process of kinematics and a corresponding optimal process of dynamics. In this case, deep neural networks (NNs) are used to approximate the policy and value functions of UAV-PE game systems in kinematics and dynamics level. Furthermore, the ADU method is adopted to improve the training efficiency of deep NN by fixing one player of the UAV-PE game systems to form a stable environment. The goal of this article is to develop an HRL algorithm with an ADU method for obtaining approximate Nash equilibrium (NE) solutions of the considered UAV-PE game systems which are subjected to the coupling of kinematics and dynamics. Subsequently, sufficient conditions are provided for analyzing the convergence and optimality of the proposed HRL algorithm. Moreover, the inequalities of overload are obtained to guarantee that the state of dynamics tracks with the control input of kinematics in UAV-PE game systems. Finally, simulation examples are provided to demonstrate the feasibility and usefulness of the proposed HRL algorithm and ADU method.

AB - This article proposes a novel hierarchical reinforcement learning (HRL) algorithm for unmanned aerial vehicle pursuit-evasion (UAV-PE) game systems with an alternative delay update (ADU) method. In the proposed algorithm, the approximate solutions of the UAV-PE game problem are derived from a hierarchical learning process, which relies on a zero-sum game process of kinematics and a corresponding optimal process of dynamics. In this case, deep neural networks (NNs) are used to approximate the policy and value functions of UAV-PE game systems in kinematics and dynamics level. Furthermore, the ADU method is adopted to improve the training efficiency of deep NN by fixing one player of the UAV-PE game systems to form a stable environment. The goal of this article is to develop an HRL algorithm with an ADU method for obtaining approximate Nash equilibrium (NE) solutions of the considered UAV-PE game systems which are subjected to the coupling of kinematics and dynamics. Subsequently, sufficient conditions are provided for analyzing the convergence and optimality of the proposed HRL algorithm. Moreover, the inequalities of overload are obtained to guarantee that the state of dynamics tracks with the control input of kinematics in UAV-PE game systems. Finally, simulation examples are provided to demonstrate the feasibility and usefulness of the proposed HRL algorithm and ADU method.

KW - Alternative delay update (ADU)

KW - Approximation algorithms

KW - Artificial neural networks

KW - Autonomous aerial vehicles

KW - Games

KW - Heuristic algorithms

KW - Kinematics

KW - Training

KW - hierarchical reinforcement learning (HRL)

KW - neural networks (NNs)

KW - unmanned aerial vehicle pursuit-evasion (UAV-PE) game

UR - http://www.scopus.com/inward/record.url?scp=85186083796&partnerID=8YFLogxK

U2 - 10.1109/TNNLS.2024.3362969

DO - 10.1109/TNNLS.2024.3362969

M3 - 文章

AN - SCOPUS:85186083796

SN - 2162-237X

SP - 1

EP - 13

JO - IEEE Transactions on Neural Networks and Learning Systems

JF - IEEE Transactions on Neural Networks and Learning Systems

ER -

Hierarchical Reinforcement Learning for UAV-PE Game With Alternative Delay Update Method

摘要

访问文件

其它文件与链接

指纹

引用此