Reinforcement Learning Based Solution to Two-player Zero-sum Game Using Differentiator

Xinxin Guo; Weisheng Yan; Peng Cui; Shouxu Zhang

doi:10.1109/ICARM.2018.8610737

Reinforcement Learning Based Solution to Two-player Zero-sum Game Using Differentiator

Xinxin Guo, Weisheng Yan, Peng Cui, Shouxu Zhang

航海学院

Northwestern Polytechnical University Xian

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

摘要

In this paper, a synchronous adaptive learning algorithm based on reinforcement learning (RL) is proposed for the solution to twoplayer zero-sum games for partially-unknown systems. To approximate the unknown drift dynamics required to solve the Hamilton-Jacobi-Isaacs equation, one feasible method is to employ a first-order robust exact differentiator (RED) to obtain the estimations of the state derivatives, and then the estimation of the unknown drift dynamics can be obtained se- quentially with the known input disturbance dynamics. An actor- critic-disturbance neural network (NN) structure is established to approximate the optimal control policy, value function and disturbance policy, respectively. An online synchronous tuning algorithm is proposed for the three NNs applying the RL technique and the designed first-order RED. The proposed method can guarantee that the optimum can be reached in the worst case of disturbance and the closed-loop system can be stabilized by applying Lyapunov theorem. Finally, the effectiveness of the presented scheme is demonstrated by two linear and nonlinear simulation examples.

源语言	英语
主期刊名	ICARM 2018 - 2018 3rd International Conference on Advanced Robotics and Mechatronics
出版商	Institute of Electrical and Electronics Engineers Inc.
页	708-713
页数	6
ISBN（电子版）	9781538670668
DOI	https://doi.org/10.1109/ICARM.2018.8610737
出版状态	已出版 - 11 1月 2019
活动	3rd IEEE International Conference on Advanced Robotics and Mechatronics, ICARM 2018 - Singapore, 新加坡期限: 18 7月 2018 → 20 7月 2018

出版系列

姓名	ICARM 2018 - 2018 3rd International Conference on Advanced Robotics and Mechatronics

会议

会议	3rd IEEE International Conference on Advanced Robotics and Mechatronics, ICARM 2018
国家/地区	新加坡
市	Singapore
时期	18/07/18 → 20/07/18

访问文件

10.1109/ICARM.2018.8610737

其它文件与链接

链接到 Scopus 的出版物

引用此

Guo, X., Yan, W., Cui, P., & Zhang, S. (2019). Reinforcement Learning Based Solution to Two-player Zero-sum Game Using Differentiator. 在 ICARM 2018 - 2018 3rd International Conference on Advanced Robotics and Mechatronics (页码 708-713). 文章 8610737 (ICARM 2018 - 2018 3rd International Conference on Advanced Robotics and Mechatronics). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICARM.2018.8610737

Guo, Xinxin ; Yan, Weisheng ; Cui, Peng 等. / Reinforcement Learning Based Solution to Two-player Zero-sum Game Using Differentiator. ICARM 2018 - 2018 3rd International Conference on Advanced Robotics and Mechatronics. Institute of Electrical and Electronics Engineers Inc., 2019. 页码 708-713 (ICARM 2018 - 2018 3rd International Conference on Advanced Robotics and Mechatronics).

@inproceedings{e83b4f64d06843719ad781ee59211b61,

title = "Reinforcement Learning Based Solution to Two-player Zero-sum Game Using Differentiator",

abstract = "In this paper, a synchronous adaptive learning algorithm based on reinforcement learning (RL) is proposed for the solution to twoplayer zero-sum games for partially-unknown systems. To approximate the unknown drift dynamics required to solve the Hamilton-Jacobi-Isaacs equation, one feasible method is to employ a first-order robust exact differentiator (RED) to obtain the estimations of the state derivatives, and then the estimation of the unknown drift dynamics can be obtained se- quentially with the known input disturbance dynamics. An actor- critic-disturbance neural network (NN) structure is established to approximate the optimal control policy, value function and disturbance policy, respectively. An online synchronous tuning algorithm is proposed for the three NNs applying the RL technique and the designed first-order RED. The proposed method can guarantee that the optimum can be reached in the worst case of disturbance and the closed-loop system can be stabilized by applying Lyapunov theorem. Finally, the effectiveness of the presented scheme is demonstrated by two linear and nonlinear simulation examples.",

keywords = "First-order robust exact differentiator, neural network, reinforcement learning, two-player zero-sum game",

author = "Xinxin Guo and Weisheng Yan and Peng Cui and Shouxu Zhang",

note = "Publisher Copyright: {\textcopyright} 2018 IEEE.; 3rd IEEE International Conference on Advanced Robotics and Mechatronics, ICARM 2018 ; Conference date: 18-07-2018 Through 20-07-2018",

year = "2019",

month = jan,

day = "11",

doi = "10.1109/ICARM.2018.8610737",

language = "英语",

series = "ICARM 2018 - 2018 3rd International Conference on Advanced Robotics and Mechatronics",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "708--713",

booktitle = "ICARM 2018 - 2018 3rd International Conference on Advanced Robotics and Mechatronics",

}

Guo, X, Yan, W, Cui, P & Zhang, S 2019, Reinforcement Learning Based Solution to Two-player Zero-sum Game Using Differentiator. 在 ICARM 2018 - 2018 3rd International Conference on Advanced Robotics and Mechatronics., 8610737, ICARM 2018 - 2018 3rd International Conference on Advanced Robotics and Mechatronics, Institute of Electrical and Electronics Engineers Inc., 页码 708-713, 3rd IEEE International Conference on Advanced Robotics and Mechatronics, ICARM 2018, Singapore, 新加坡, 18/07/18. https://doi.org/10.1109/ICARM.2018.8610737

Reinforcement Learning Based Solution to Two-player Zero-sum Game Using Differentiator. / Guo, Xinxin; Yan, Weisheng; Cui, Peng 等.
ICARM 2018 - 2018 3rd International Conference on Advanced Robotics and Mechatronics. Institute of Electrical and Electronics Engineers Inc., 2019. 页码 708-713 8610737 (ICARM 2018 - 2018 3rd International Conference on Advanced Robotics and Mechatronics).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Reinforcement Learning Based Solution to Two-player Zero-sum Game Using Differentiator

AU - Guo, Xinxin

AU - Yan, Weisheng

AU - Cui, Peng

AU - Zhang, Shouxu

PY - 2019/1/11

Y1 - 2019/1/11

N2 - In this paper, a synchronous adaptive learning algorithm based on reinforcement learning (RL) is proposed for the solution to twoplayer zero-sum games for partially-unknown systems. To approximate the unknown drift dynamics required to solve the Hamilton-Jacobi-Isaacs equation, one feasible method is to employ a first-order robust exact differentiator (RED) to obtain the estimations of the state derivatives, and then the estimation of the unknown drift dynamics can be obtained se- quentially with the known input disturbance dynamics. An actor- critic-disturbance neural network (NN) structure is established to approximate the optimal control policy, value function and disturbance policy, respectively. An online synchronous tuning algorithm is proposed for the three NNs applying the RL technique and the designed first-order RED. The proposed method can guarantee that the optimum can be reached in the worst case of disturbance and the closed-loop system can be stabilized by applying Lyapunov theorem. Finally, the effectiveness of the presented scheme is demonstrated by two linear and nonlinear simulation examples.

AB - In this paper, a synchronous adaptive learning algorithm based on reinforcement learning (RL) is proposed for the solution to twoplayer zero-sum games for partially-unknown systems. To approximate the unknown drift dynamics required to solve the Hamilton-Jacobi-Isaacs equation, one feasible method is to employ a first-order robust exact differentiator (RED) to obtain the estimations of the state derivatives, and then the estimation of the unknown drift dynamics can be obtained se- quentially with the known input disturbance dynamics. An actor- critic-disturbance neural network (NN) structure is established to approximate the optimal control policy, value function and disturbance policy, respectively. An online synchronous tuning algorithm is proposed for the three NNs applying the RL technique and the designed first-order RED. The proposed method can guarantee that the optimum can be reached in the worst case of disturbance and the closed-loop system can be stabilized by applying Lyapunov theorem. Finally, the effectiveness of the presented scheme is demonstrated by two linear and nonlinear simulation examples.

KW - First-order robust exact differentiator

KW - neural network

KW - reinforcement learning

KW - two-player zero-sum game

UR - http://www.scopus.com/inward/record.url?scp=85061481698&partnerID=8YFLogxK

U2 - 10.1109/ICARM.2018.8610737

DO - 10.1109/ICARM.2018.8610737

M3 - 会议稿件

AN - SCOPUS:85061481698

T3 - ICARM 2018 - 2018 3rd International Conference on Advanced Robotics and Mechatronics

SP - 708

EP - 713

BT - ICARM 2018 - 2018 3rd International Conference on Advanced Robotics and Mechatronics

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 3rd IEEE International Conference on Advanced Robotics and Mechatronics, ICARM 2018

Y2 - 18 July 2018 through 20 July 2018

ER -

Guo X, Yan W, Cui P, Zhang S. Reinforcement Learning Based Solution to Two-player Zero-sum Game Using Differentiator. 在 ICARM 2018 - 2018 3rd International Conference on Advanced Robotics and Mechatronics. Institute of Electrical and Electronics Engineers Inc. 2019. 页码 708-713. 8610737. (ICARM 2018 - 2018 3rd International Conference on Advanced Robotics and Mechatronics). doi: 10.1109/ICARM.2018.8610737

Reinforcement Learning Based Solution to Two-player Zero-sum Game Using Differentiator

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此