TY - JOUR
T1 - Hierarchical Reinforcement Learning for UAV-PE Game With Alternative Delay Update Method
AU - Ma, Xiao
AU - Yuan, Yuan
AU - Guo, Lei
N1 - Publisher Copyright:
IEEE
PY - 2024
Y1 - 2024
N2 - This article proposes a novel hierarchical reinforcement learning (HRL) algorithm for unmanned aerial vehicle pursuit-evasion (UAV-PE) game systems with an alternative delay update (ADU) method. In the proposed algorithm, the approximate solutions of the UAV-PE game problem are derived from a hierarchical learning process, which relies on a zero-sum game process of kinematics and a corresponding optimal process of dynamics. In this case, deep neural networks (NNs) are used to approximate the policy and value functions of UAV-PE game systems in kinematics and dynamics level. Furthermore, the ADU method is adopted to improve the training efficiency of deep NN by fixing one player of the UAV-PE game systems to form a stable environment. The goal of this article is to develop an HRL algorithm with an ADU method for obtaining approximate Nash equilibrium (NE) solutions of the considered UAV-PE game systems which are subjected to the coupling of kinematics and dynamics. Subsequently, sufficient conditions are provided for analyzing the convergence and optimality of the proposed HRL algorithm. Moreover, the inequalities of overload are obtained to guarantee that the state of dynamics tracks with the control input of kinematics in UAV-PE game systems. Finally, simulation examples are provided to demonstrate the feasibility and usefulness of the proposed HRL algorithm and ADU method.
AB - This article proposes a novel hierarchical reinforcement learning (HRL) algorithm for unmanned aerial vehicle pursuit-evasion (UAV-PE) game systems with an alternative delay update (ADU) method. In the proposed algorithm, the approximate solutions of the UAV-PE game problem are derived from a hierarchical learning process, which relies on a zero-sum game process of kinematics and a corresponding optimal process of dynamics. In this case, deep neural networks (NNs) are used to approximate the policy and value functions of UAV-PE game systems in kinematics and dynamics level. Furthermore, the ADU method is adopted to improve the training efficiency of deep NN by fixing one player of the UAV-PE game systems to form a stable environment. The goal of this article is to develop an HRL algorithm with an ADU method for obtaining approximate Nash equilibrium (NE) solutions of the considered UAV-PE game systems which are subjected to the coupling of kinematics and dynamics. Subsequently, sufficient conditions are provided for analyzing the convergence and optimality of the proposed HRL algorithm. Moreover, the inequalities of overload are obtained to guarantee that the state of dynamics tracks with the control input of kinematics in UAV-PE game systems. Finally, simulation examples are provided to demonstrate the feasibility and usefulness of the proposed HRL algorithm and ADU method.
KW - Alternative delay update (ADU)
KW - Approximation algorithms
KW - Artificial neural networks
KW - Autonomous aerial vehicles
KW - Games
KW - Heuristic algorithms
KW - Kinematics
KW - Training
KW - hierarchical reinforcement learning (HRL)
KW - neural networks (NNs)
KW - unmanned aerial vehicle pursuit-evasion (UAV-PE) game
UR - http://www.scopus.com/inward/record.url?scp=85186083796&partnerID=8YFLogxK
U2 - 10.1109/TNNLS.2024.3362969
DO - 10.1109/TNNLS.2024.3362969
M3 - 文章
AN - SCOPUS:85186083796
SN - 2162-237X
SP - 1
EP - 13
JO - IEEE Transactions on Neural Networks and Learning Systems
JF - IEEE Transactions on Neural Networks and Learning Systems
ER -