TY - JOUR
T1 - Integral-Reinforcement-Learning-Based Hierarchical Optimal Evolutionary Strategy for Continuous Action Social Dilemma Games
AU - Fan, Litong
AU - Yu, Dengxiu
AU - Wang, Zhen
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2024
Y1 - 2024
N2 - This article presents a framework for exploring optimal evolutionary strategies in continuous-action social dilemma games with a hierarchical structure comprising a leader and multifollowers. Previous studies in game theory have frequently overlooked the hierarchical structure among individuals, assuming that decisions are made simultaneously. Here, we propose a hierarchical structure for continuous action games that involves a leader and followers to enhance cooperation. The optimal evolutionary strategy for the leader is to guide the followers' actions to maximize overall benefits by exerting minimal control, while the followers aim to maximize their payoff by making minimal changes to their strategies. We establish the coupled Hamilton-Jacobi-Bellman (HJB) equations to find the optimal evolutionary strategy. To address the complexity of asymmetric roles arising from the leader-follower structure, we introduce an integral reinforcement learning (RL) algorithm known as two-level heuristic dynamic programming (HDP)-based value iteration (VI). The implementation of the algorithm utilizes neural networks (NNs) to approximate the value functions. Moreover, the convergence of the proposed algorithm is demonstrated. Additionally, three social dilemma models are presented to validate the efficacy of the proposed algorithm.
AB - This article presents a framework for exploring optimal evolutionary strategies in continuous-action social dilemma games with a hierarchical structure comprising a leader and multifollowers. Previous studies in game theory have frequently overlooked the hierarchical structure among individuals, assuming that decisions are made simultaneously. Here, we propose a hierarchical structure for continuous action games that involves a leader and followers to enhance cooperation. The optimal evolutionary strategy for the leader is to guide the followers' actions to maximize overall benefits by exerting minimal control, while the followers aim to maximize their payoff by making minimal changes to their strategies. We establish the coupled Hamilton-Jacobi-Bellman (HJB) equations to find the optimal evolutionary strategy. To address the complexity of asymmetric roles arising from the leader-follower structure, we introduce an integral reinforcement learning (RL) algorithm known as two-level heuristic dynamic programming (HDP)-based value iteration (VI). The implementation of the algorithm utilizes neural networks (NNs) to approximate the value functions. Moreover, the convergence of the proposed algorithm is demonstrated. Additionally, three social dilemma models are presented to validate the efficacy of the proposed algorithm.
KW - Hamilton-Jacobi-Bellman (HJB)
KW - hierarchical
KW - integral reinforcement learning
KW - social dilemma
KW - value iteration (VI)
UR - http://www.scopus.com/inward/record.url?scp=85206212687&partnerID=8YFLogxK
U2 - 10.1109/TCSS.2024.3409833
DO - 10.1109/TCSS.2024.3409833
M3 - 文章
AN - SCOPUS:85206212687
SN - 2329-924X
VL - 11
SP - 6807
EP - 6818
JO - IEEE Transactions on Computational Social Systems
JF - IEEE Transactions on Computational Social Systems
IS - 5
ER -