Robust hierarchical games of linear discrete-time systems based on off-policy model-free reinforcement learning

Xiao Ma; Yuan Yuan

doi:10.1016/j.jfranklin.2024.106711

Robust hierarchical games of linear discrete-time systems based on off-policy model-free reinforcement learning

Xiao Ma, Yuan Yuan

School of Astronautics

Northwestern Polytechnical University Xian

Research output: Contribution to journal › Article › peer-review

2 Scopus citations

Abstract

An off-policy model-free reinforcement learning (RL) algorithm is proposed for a robust hierarchical game while considering incomplete information and input constraints. The robust hierarchical game exhibits characteristics of a Stackelberg–Nash (SN) game, where equilibrium points are designated as Stackelberg–Nash–Saddle equilibrium (SNE) points. An off-policy method is employed for the RL algorithm, addressing input constraints by using excitation input instead of real-time update polices as control inputs. Moreover, a model-free method is implemented for the off-policy RL algorithm, accounting for the challenge posed by incomplete information. The goal of this paper is to develop an off-policy model-free RL algorithm to obtain approximate SNE polices of the robust hierarchical game with incomplete information and input constraints. Furthermore, the convergence and effectiveness of the off-policy model-free RL algorithm are guaranteed by proving the equivalence of Bellman equation between nominal SNE policies and approximate SNE policies. Finally, a simulation is provided to verify the advantage of the developed algorithm.

Original language	English
Article number	106711
Journal	Journal of the Franklin Institute
Volume	361
Issue number	7
DOIs	https://doi.org/10.1016/j.jfranklin.2024.106711
State	Published - May 2024

Keywords

Model-free
Off-policy
Reinforcement learning
Robust hierarchical game

Access to Document

10.1016/j.jfranklin.2024.106711

Cite this

@article{aa5a21f16dc641b490860a4cd970215c,

title = "Robust hierarchical games of linear discrete-time systems based on off-policy model-free reinforcement learning",

abstract = "An off-policy model-free reinforcement learning (RL) algorithm is proposed for a robust hierarchical game while considering incomplete information and input constraints. The robust hierarchical game exhibits characteristics of a Stackelberg–Nash (SN) game, where equilibrium points are designated as Stackelberg–Nash–Saddle equilibrium (SNE) points. An off-policy method is employed for the RL algorithm, addressing input constraints by using excitation input instead of real-time update polices as control inputs. Moreover, a model-free method is implemented for the off-policy RL algorithm, accounting for the challenge posed by incomplete information. The goal of this paper is to develop an off-policy model-free RL algorithm to obtain approximate SNE polices of the robust hierarchical game with incomplete information and input constraints. Furthermore, the convergence and effectiveness of the off-policy model-free RL algorithm are guaranteed by proving the equivalence of Bellman equation between nominal SNE policies and approximate SNE policies. Finally, a simulation is provided to verify the advantage of the developed algorithm.",

keywords = "Model-free, Off-policy, Reinforcement learning, Robust hierarchical game",

author = "Xiao Ma and Yuan Yuan",

note = "Publisher Copyright: {\textcopyright} 2024",

year = "2024",

month = may,

doi = "10.1016/j.jfranklin.2024.106711",

language = "英语",

volume = "361",

journal = "Journal of the Franklin Institute",

issn = "0016-0032",

publisher = "Elsevier Ltd",

number = "7",

}

TY - JOUR

T1 - Robust hierarchical games of linear discrete-time systems based on off-policy model-free reinforcement learning

AU - Ma, Xiao

AU - Yuan, Yuan

PY - 2024/5

Y1 - 2024/5

N2 - An off-policy model-free reinforcement learning (RL) algorithm is proposed for a robust hierarchical game while considering incomplete information and input constraints. The robust hierarchical game exhibits characteristics of a Stackelberg–Nash (SN) game, where equilibrium points are designated as Stackelberg–Nash–Saddle equilibrium (SNE) points. An off-policy method is employed for the RL algorithm, addressing input constraints by using excitation input instead of real-time update polices as control inputs. Moreover, a model-free method is implemented for the off-policy RL algorithm, accounting for the challenge posed by incomplete information. The goal of this paper is to develop an off-policy model-free RL algorithm to obtain approximate SNE polices of the robust hierarchical game with incomplete information and input constraints. Furthermore, the convergence and effectiveness of the off-policy model-free RL algorithm are guaranteed by proving the equivalence of Bellman equation between nominal SNE policies and approximate SNE policies. Finally, a simulation is provided to verify the advantage of the developed algorithm.

AB - An off-policy model-free reinforcement learning (RL) algorithm is proposed for a robust hierarchical game while considering incomplete information and input constraints. The robust hierarchical game exhibits characteristics of a Stackelberg–Nash (SN) game, where equilibrium points are designated as Stackelberg–Nash–Saddle equilibrium (SNE) points. An off-policy method is employed for the RL algorithm, addressing input constraints by using excitation input instead of real-time update polices as control inputs. Moreover, a model-free method is implemented for the off-policy RL algorithm, accounting for the challenge posed by incomplete information. The goal of this paper is to develop an off-policy model-free RL algorithm to obtain approximate SNE polices of the robust hierarchical game with incomplete information and input constraints. Furthermore, the convergence and effectiveness of the off-policy model-free RL algorithm are guaranteed by proving the equivalence of Bellman equation between nominal SNE policies and approximate SNE policies. Finally, a simulation is provided to verify the advantage of the developed algorithm.

KW - Model-free

KW - Off-policy

KW - Reinforcement learning

KW - Robust hierarchical game

UR - http://www.scopus.com/inward/record.url?scp=85189675479&partnerID=8YFLogxK

U2 - 10.1016/j.jfranklin.2024.106711

DO - 10.1016/j.jfranklin.2024.106711

M3 - 文章

AN - SCOPUS:85189675479

SN - 0016-0032

VL - 361

JO - Journal of the Franklin Institute

JF - Journal of the Franklin Institute

IS - 7

M1 - 106711

ER -

Robust hierarchical games of linear discrete-time systems based on off-policy model-free reinforcement learning

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this