Robust hierarchical games of linear discrete-time systems based on off-policy model-free reinforcement learning

Xiao Ma; Yuan Yuan

doi:10.1016/j.jfranklin.2024.106711

Robust hierarchical games of linear discrete-time systems based on off-policy model-free reinforcement learning

Xiao Ma, Yuan Yuan

航天学院

Northwestern Polytechnical University Xian

科研成果: 期刊稿件 › 文章 › 同行评审

2 引用（Scopus）

摘要

An off-policy model-free reinforcement learning (RL) algorithm is proposed for a robust hierarchical game while considering incomplete information and input constraints. The robust hierarchical game exhibits characteristics of a Stackelberg–Nash (SN) game, where equilibrium points are designated as Stackelberg–Nash–Saddle equilibrium (SNE) points. An off-policy method is employed for the RL algorithm, addressing input constraints by using excitation input instead of real-time update polices as control inputs. Moreover, a model-free method is implemented for the off-policy RL algorithm, accounting for the challenge posed by incomplete information. The goal of this paper is to develop an off-policy model-free RL algorithm to obtain approximate SNE polices of the robust hierarchical game with incomplete information and input constraints. Furthermore, the convergence and effectiveness of the off-policy model-free RL algorithm are guaranteed by proving the equivalence of Bellman equation between nominal SNE policies and approximate SNE policies. Finally, a simulation is provided to verify the advantage of the developed algorithm.

源语言	英语
文章编号	106711
期刊	Journal of the Franklin Institute
卷	361
期	7
DOI	https://doi.org/10.1016/j.jfranklin.2024.106711
出版状态	已出版 - 5月 2024

访问文件

10.1016/j.jfranklin.2024.106711

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{aa5a21f16dc641b490860a4cd970215c,

title = "Robust hierarchical games of linear discrete-time systems based on off-policy model-free reinforcement learning",

abstract = "An off-policy model-free reinforcement learning (RL) algorithm is proposed for a robust hierarchical game while considering incomplete information and input constraints. The robust hierarchical game exhibits characteristics of a Stackelberg–Nash (SN) game, where equilibrium points are designated as Stackelberg–Nash–Saddle equilibrium (SNE) points. An off-policy method is employed for the RL algorithm, addressing input constraints by using excitation input instead of real-time update polices as control inputs. Moreover, a model-free method is implemented for the off-policy RL algorithm, accounting for the challenge posed by incomplete information. The goal of this paper is to develop an off-policy model-free RL algorithm to obtain approximate SNE polices of the robust hierarchical game with incomplete information and input constraints. Furthermore, the convergence and effectiveness of the off-policy model-free RL algorithm are guaranteed by proving the equivalence of Bellman equation between nominal SNE policies and approximate SNE policies. Finally, a simulation is provided to verify the advantage of the developed algorithm.",

keywords = "Model-free, Off-policy, Reinforcement learning, Robust hierarchical game",

author = "Xiao Ma and Yuan Yuan",

note = "Publisher Copyright: {\textcopyright} 2024",

year = "2024",

month = may,

doi = "10.1016/j.jfranklin.2024.106711",

language = "英语",

volume = "361",

journal = "Journal of the Franklin Institute",

issn = "0016-0032",

publisher = "Elsevier Ltd",

number = "7",

}

TY - JOUR

T1 - Robust hierarchical games of linear discrete-time systems based on off-policy model-free reinforcement learning

AU - Ma, Xiao

AU - Yuan, Yuan

PY - 2024/5

Y1 - 2024/5

N2 - An off-policy model-free reinforcement learning (RL) algorithm is proposed for a robust hierarchical game while considering incomplete information and input constraints. The robust hierarchical game exhibits characteristics of a Stackelberg–Nash (SN) game, where equilibrium points are designated as Stackelberg–Nash–Saddle equilibrium (SNE) points. An off-policy method is employed for the RL algorithm, addressing input constraints by using excitation input instead of real-time update polices as control inputs. Moreover, a model-free method is implemented for the off-policy RL algorithm, accounting for the challenge posed by incomplete information. The goal of this paper is to develop an off-policy model-free RL algorithm to obtain approximate SNE polices of the robust hierarchical game with incomplete information and input constraints. Furthermore, the convergence and effectiveness of the off-policy model-free RL algorithm are guaranteed by proving the equivalence of Bellman equation between nominal SNE policies and approximate SNE policies. Finally, a simulation is provided to verify the advantage of the developed algorithm.

AB - An off-policy model-free reinforcement learning (RL) algorithm is proposed for a robust hierarchical game while considering incomplete information and input constraints. The robust hierarchical game exhibits characteristics of a Stackelberg–Nash (SN) game, where equilibrium points are designated as Stackelberg–Nash–Saddle equilibrium (SNE) points. An off-policy method is employed for the RL algorithm, addressing input constraints by using excitation input instead of real-time update polices as control inputs. Moreover, a model-free method is implemented for the off-policy RL algorithm, accounting for the challenge posed by incomplete information. The goal of this paper is to develop an off-policy model-free RL algorithm to obtain approximate SNE polices of the robust hierarchical game with incomplete information and input constraints. Furthermore, the convergence and effectiveness of the off-policy model-free RL algorithm are guaranteed by proving the equivalence of Bellman equation between nominal SNE policies and approximate SNE policies. Finally, a simulation is provided to verify the advantage of the developed algorithm.

KW - Model-free

KW - Off-policy

KW - Reinforcement learning

KW - Robust hierarchical game

UR - http://www.scopus.com/inward/record.url?scp=85189675479&partnerID=8YFLogxK

U2 - 10.1016/j.jfranklin.2024.106711

DO - 10.1016/j.jfranklin.2024.106711

M3 - 文章

AN - SCOPUS:85189675479

SN - 0016-0032

VL - 361

JO - Journal of the Franklin Institute

JF - Journal of the Franklin Institute

IS - 7

M1 - 106711

ER -

Robust hierarchical games of linear discrete-time systems based on off-policy model-free reinforcement learning

摘要

访问文件

其它文件与链接

指纹

引用此