TY - GEN
T1 - Reinforcement Learning Based Solution to Two-player Zero-sum Game Using Differentiator
AU - Guo, Xinxin
AU - Yan, Weisheng
AU - Cui, Peng
AU - Zhang, Shouxu
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2019/1/11
Y1 - 2019/1/11
N2 - In this paper, a synchronous adaptive learning algorithm based on reinforcement learning (RL) is proposed for the solution to twoplayer zero-sum games for partially-unknown systems. To approximate the unknown drift dynamics required to solve the Hamilton-Jacobi-Isaacs equation, one feasible method is to employ a first-order robust exact differentiator (RED) to obtain the estimations of the state derivatives, and then the estimation of the unknown drift dynamics can be obtained se- quentially with the known input disturbance dynamics. An actor- critic-disturbance neural network (NN) structure is established to approximate the optimal control policy, value function and disturbance policy, respectively. An online synchronous tuning algorithm is proposed for the three NNs applying the RL technique and the designed first-order RED. The proposed method can guarantee that the optimum can be reached in the worst case of disturbance and the closed-loop system can be stabilized by applying Lyapunov theorem. Finally, the effectiveness of the presented scheme is demonstrated by two linear and nonlinear simulation examples.
AB - In this paper, a synchronous adaptive learning algorithm based on reinforcement learning (RL) is proposed for the solution to twoplayer zero-sum games for partially-unknown systems. To approximate the unknown drift dynamics required to solve the Hamilton-Jacobi-Isaacs equation, one feasible method is to employ a first-order robust exact differentiator (RED) to obtain the estimations of the state derivatives, and then the estimation of the unknown drift dynamics can be obtained se- quentially with the known input disturbance dynamics. An actor- critic-disturbance neural network (NN) structure is established to approximate the optimal control policy, value function and disturbance policy, respectively. An online synchronous tuning algorithm is proposed for the three NNs applying the RL technique and the designed first-order RED. The proposed method can guarantee that the optimum can be reached in the worst case of disturbance and the closed-loop system can be stabilized by applying Lyapunov theorem. Finally, the effectiveness of the presented scheme is demonstrated by two linear and nonlinear simulation examples.
KW - First-order robust exact differentiator
KW - neural network
KW - reinforcement learning
KW - two-player zero-sum game
UR - http://www.scopus.com/inward/record.url?scp=85061481698&partnerID=8YFLogxK
U2 - 10.1109/ICARM.2018.8610737
DO - 10.1109/ICARM.2018.8610737
M3 - 会议稿件
AN - SCOPUS:85061481698
T3 - ICARM 2018 - 2018 3rd International Conference on Advanced Robotics and Mechatronics
SP - 708
EP - 713
BT - ICARM 2018 - 2018 3rd International Conference on Advanced Robotics and Mechatronics
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 3rd IEEE International Conference on Advanced Robotics and Mechatronics, ICARM 2018
Y2 - 18 July 2018 through 20 July 2018
ER -