Hierarchical Reinforcement Learning for UAV-PE Game With Alternative Delay Update Method

Xiao Ma; Yuan Yuan; Lei Guo

doi:10.1109/TNNLS.2024.3362969

Hierarchical Reinforcement Learning for UAV-PE Game With Alternative Delay Update Method

Xiao Ma, Yuan Yuan, Lei Guo

School of Astronautics

Research output: Contribution to journal › Article › peer-review

Abstract

This article proposes a novel hierarchical reinforcement learning (HRL) algorithm for unmanned aerial vehicle pursuit-evasion (UAV-PE) game systems with an alternative delay update (ADU) method. In the proposed algorithm, the approximate solutions of the UAV-PE game problem are derived from a hierarchical learning process, which relies on a zero-sum game process of kinematics and a corresponding optimal process of dynamics. In this case, deep neural networks (NNs) are used to approximate the policy and value functions of UAV-PE game systems in kinematics and dynamics level. Furthermore, the ADU method is adopted to improve the training efficiency of deep NN by fixing one player of the UAV-PE game systems to form a stable environment. The goal of this article is to develop an HRL algorithm with an ADU method for obtaining approximate Nash equilibrium (NE) solutions of the considered UAV-PE game systems which are subjected to the coupling of kinematics and dynamics. Subsequently, sufficient conditions are provided for analyzing the convergence and optimality of the proposed HRL algorithm. Moreover, the inequalities of overload are obtained to guarantee that the state of dynamics tracks with the control input of kinematics in UAV-PE game systems. Finally, simulation examples are provided to demonstrate the feasibility and usefulness of the proposed HRL algorithm and ADU method.

Original language	English
Pages (from-to)	1-13
Number of pages	13
Journal	IEEE Transactions on Neural Networks and Learning Systems
DOIs	https://doi.org/10.1109/TNNLS.2024.3362969
State	Accepted/In press - 2024

Keywords

Alternative delay update (ADU)
Approximation algorithms
Artificial neural networks
Autonomous aerial vehicles
Games
Heuristic algorithms
Kinematics
Training
hierarchical reinforcement learning (HRL)
neural networks (NNs)
unmanned aerial vehicle pursuit-evasion (UAV-PE) game

Access to Document

10.1109/TNNLS.2024.3362969

Cite this

@article{7b144cffb3c748268931152a8bc4c6f7,

title = "Hierarchical Reinforcement Learning for UAV-PE Game With Alternative Delay Update Method",

abstract = "This article proposes a novel hierarchical reinforcement learning (HRL) algorithm for unmanned aerial vehicle pursuit-evasion (UAV-PE) game systems with an alternative delay update (ADU) method. In the proposed algorithm, the approximate solutions of the UAV-PE game problem are derived from a hierarchical learning process, which relies on a zero-sum game process of kinematics and a corresponding optimal process of dynamics. In this case, deep neural networks (NNs) are used to approximate the policy and value functions of UAV-PE game systems in kinematics and dynamics level. Furthermore, the ADU method is adopted to improve the training efficiency of deep NN by fixing one player of the UAV-PE game systems to form a stable environment. The goal of this article is to develop an HRL algorithm with an ADU method for obtaining approximate Nash equilibrium (NE) solutions of the considered UAV-PE game systems which are subjected to the coupling of kinematics and dynamics. Subsequently, sufficient conditions are provided for analyzing the convergence and optimality of the proposed HRL algorithm. Moreover, the inequalities of overload are obtained to guarantee that the state of dynamics tracks with the control input of kinematics in UAV-PE game systems. Finally, simulation examples are provided to demonstrate the feasibility and usefulness of the proposed HRL algorithm and ADU method.",

keywords = "Alternative delay update (ADU), Approximation algorithms, Artificial neural networks, Autonomous aerial vehicles, Games, Heuristic algorithms, Kinematics, Training, hierarchical reinforcement learning (HRL), neural networks (NNs), unmanned aerial vehicle pursuit-evasion (UAV-PE) game",

author = "Xiao Ma and Yuan Yuan and Lei Guo",

note = "Publisher Copyright: IEEE",

year = "2024",

doi = "10.1109/TNNLS.2024.3362969",

language = "英语",

pages = "1--13",

journal = "IEEE Transactions on Neural Networks and Learning Systems",

issn = "2162-237X",

publisher = "IEEE Computational Intelligence Society",

}

TY - JOUR

T1 - Hierarchical Reinforcement Learning for UAV-PE Game With Alternative Delay Update Method

AU - Ma, Xiao

AU - Yuan, Yuan

AU - Guo, Lei

N1 - Publisher Copyright: IEEE

PY - 2024

Y1 - 2024

N2 - This article proposes a novel hierarchical reinforcement learning (HRL) algorithm for unmanned aerial vehicle pursuit-evasion (UAV-PE) game systems with an alternative delay update (ADU) method. In the proposed algorithm, the approximate solutions of the UAV-PE game problem are derived from a hierarchical learning process, which relies on a zero-sum game process of kinematics and a corresponding optimal process of dynamics. In this case, deep neural networks (NNs) are used to approximate the policy and value functions of UAV-PE game systems in kinematics and dynamics level. Furthermore, the ADU method is adopted to improve the training efficiency of deep NN by fixing one player of the UAV-PE game systems to form a stable environment. The goal of this article is to develop an HRL algorithm with an ADU method for obtaining approximate Nash equilibrium (NE) solutions of the considered UAV-PE game systems which are subjected to the coupling of kinematics and dynamics. Subsequently, sufficient conditions are provided for analyzing the convergence and optimality of the proposed HRL algorithm. Moreover, the inequalities of overload are obtained to guarantee that the state of dynamics tracks with the control input of kinematics in UAV-PE game systems. Finally, simulation examples are provided to demonstrate the feasibility and usefulness of the proposed HRL algorithm and ADU method.

AB - This article proposes a novel hierarchical reinforcement learning (HRL) algorithm for unmanned aerial vehicle pursuit-evasion (UAV-PE) game systems with an alternative delay update (ADU) method. In the proposed algorithm, the approximate solutions of the UAV-PE game problem are derived from a hierarchical learning process, which relies on a zero-sum game process of kinematics and a corresponding optimal process of dynamics. In this case, deep neural networks (NNs) are used to approximate the policy and value functions of UAV-PE game systems in kinematics and dynamics level. Furthermore, the ADU method is adopted to improve the training efficiency of deep NN by fixing one player of the UAV-PE game systems to form a stable environment. The goal of this article is to develop an HRL algorithm with an ADU method for obtaining approximate Nash equilibrium (NE) solutions of the considered UAV-PE game systems which are subjected to the coupling of kinematics and dynamics. Subsequently, sufficient conditions are provided for analyzing the convergence and optimality of the proposed HRL algorithm. Moreover, the inequalities of overload are obtained to guarantee that the state of dynamics tracks with the control input of kinematics in UAV-PE game systems. Finally, simulation examples are provided to demonstrate the feasibility and usefulness of the proposed HRL algorithm and ADU method.

KW - Alternative delay update (ADU)

KW - Approximation algorithms

KW - Artificial neural networks

KW - Autonomous aerial vehicles

KW - Games

KW - Heuristic algorithms

KW - Kinematics

KW - Training

KW - hierarchical reinforcement learning (HRL)

KW - neural networks (NNs)

KW - unmanned aerial vehicle pursuit-evasion (UAV-PE) game

UR - http://www.scopus.com/inward/record.url?scp=85186083796&partnerID=8YFLogxK

U2 - 10.1109/TNNLS.2024.3362969

DO - 10.1109/TNNLS.2024.3362969

M3 - 文章

AN - SCOPUS:85186083796

SN - 2162-237X

SP - 1

EP - 13

JO - IEEE Transactions on Neural Networks and Learning Systems

JF - IEEE Transactions on Neural Networks and Learning Systems

ER -

Hierarchical Reinforcement Learning for UAV-PE Game With Alternative Delay Update Method

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this