Hierarchical Reinforcement Learning for UAV-PE Game With Alternative Delay Update Method

Xiao Ma, Yuan Yuan, Lei Guo

Research output: Contribution to journalArticlepeer-review

Abstract

This article proposes a novel hierarchical reinforcement learning (HRL) algorithm for unmanned aerial vehicle pursuit-evasion (UAV-PE) game systems with an alternative delay update (ADU) method. In the proposed algorithm, the approximate solutions of the UAV-PE game problem are derived from a hierarchical learning process, which relies on a zero-sum game process of kinematics and a corresponding optimal process of dynamics. In this case, deep neural networks (NNs) are used to approximate the policy and value functions of UAV-PE game systems in kinematics and dynamics level. Furthermore, the ADU method is adopted to improve the training efficiency of deep NN by fixing one player of the UAV-PE game systems to form a stable environment. The goal of this article is to develop an HRL algorithm with an ADU method for obtaining approximate Nash equilibrium (NE) solutions of the considered UAV-PE game systems which are subjected to the coupling of kinematics and dynamics. Subsequently, sufficient conditions are provided for analyzing the convergence and optimality of the proposed HRL algorithm. Moreover, the inequalities of overload are obtained to guarantee that the state of dynamics tracks with the control input of kinematics in UAV-PE game systems. Finally, simulation examples are provided to demonstrate the feasibility and usefulness of the proposed HRL algorithm and ADU method.

Original languageEnglish
Pages (from-to)1-13
Number of pages13
JournalIEEE Transactions on Neural Networks and Learning Systems
DOIs
StateAccepted/In press - 2024

Keywords

  • Alternative delay update (ADU)
  • Approximation algorithms
  • Artificial neural networks
  • Autonomous aerial vehicles
  • Games
  • Heuristic algorithms
  • Kinematics
  • Training
  • hierarchical reinforcement learning (HRL)
  • neural networks (NNs)
  • unmanned aerial vehicle pursuit-evasion (UAV-PE) game

Fingerprint

Dive into the research topics of 'Hierarchical Reinforcement Learning for UAV-PE Game With Alternative Delay Update Method'. Together they form a unique fingerprint.

Cite this