TY - JOUR
T1 - Game-Based Backstepping Design for Strict-Feedback Nonlinear Multi-Agent Systems Based on Reinforcement Learning
AU - Long, Jia
AU - Yu, Dengxiu
AU - Wen, Guoxing
AU - Li, Li
AU - Wang, Zhen
AU - Chen, C. L.Philip
N1 - Publisher Copyright:
© 2012 IEEE.
PY - 2024/1/1
Y1 - 2024/1/1
N2 - In this article, the game-based backstepping control method is proposed for the high-order nonlinear multi-agent system with unknown dynamic and input saturation. Reinforcement learning (RL) is employed to get the saddle point solution of the tracking game between each agent and the reference signal for achieving robust control. Specifically, the approximate optimal solution of the established Hamilton-Jacobi-Isaacs (HJI) equation is obtained by policy iteration for each subsystem, and the single network adaptive critic (SNAC) architecture is used to reduce the computational burden. In addition, based on the separation operation of the error term from the derivative of the value function, we achieve the different proportions of the two agents in the game to realize the regulation of the final equilibrium point. Different from the general use of the neural network for system identification, the unknown nonlinear dynamic term is approximated based on the state difference obtained by the command filter. Furthermore, a sufficient condition is established to guarantee that the whole system and each subsystem included are uniformly ultimately bounded. Finally, simulation results are given to show the effectiveness of the proposed method.
AB - In this article, the game-based backstepping control method is proposed for the high-order nonlinear multi-agent system with unknown dynamic and input saturation. Reinforcement learning (RL) is employed to get the saddle point solution of the tracking game between each agent and the reference signal for achieving robust control. Specifically, the approximate optimal solution of the established Hamilton-Jacobi-Isaacs (HJI) equation is obtained by policy iteration for each subsystem, and the single network adaptive critic (SNAC) architecture is used to reduce the computational burden. In addition, based on the separation operation of the error term from the derivative of the value function, we achieve the different proportions of the two agents in the game to realize the regulation of the final equilibrium point. Different from the general use of the neural network for system identification, the unknown nonlinear dynamic term is approximated based on the state difference obtained by the command filter. Furthermore, a sufficient condition is established to guarantee that the whole system and each subsystem included are uniformly ultimately bounded. Finally, simulation results are given to show the effectiveness of the proposed method.
KW - Game-based backstepping
KW - high-order multi-agent system
KW - neural network (NN)
KW - reinforcement learning (RL)
KW - tracking game
UR - http://www.scopus.com/inward/record.url?scp=85131731782&partnerID=8YFLogxK
U2 - 10.1109/TNNLS.2022.3177461
DO - 10.1109/TNNLS.2022.3177461
M3 - 文章
AN - SCOPUS:85131731782
SN - 2162-237X
VL - 35
SP - 817
EP - 830
JO - IEEE Transactions on Neural Networks and Learning Systems
JF - IEEE Transactions on Neural Networks and Learning Systems
IS - 1
M1 - 3177461
ER -