TY - JOUR
T1 - Effect of state transition triggered by reinforcement learning in evolutionary prisoner's dilemma game
AU - Guo, Hao
AU - Wang, Zhen
AU - Song, Zhao
AU - Yuan, Yuan
AU - Deng, Xinyang
AU - Li, Xuelong
N1 - Publisher Copyright:
© 2022 Elsevier B.V.
PY - 2022/10/28
Y1 - 2022/10/28
N2 - Cooperative behavior is essential for conflicts between collective and individual benefits, and evolutionary game theory provides a key framework to solve this problem. Decision-making of human or automata agent occurs not only in a static environment, but also in the dynamic interactive environment. Since the reinforcement learning algorithm is well performed at explaining the problem in regard to state, action, and environment, we propose a game model with individual state transition which is influenced by the self-regarding Q-learning algorithm. In detail, we at the first time investigate a two-state two-action game, where agents can choose either to participate in the network game (i.e., active agent) or to cut off all the links based on its Q-table (i.e., inactive agent), involving in a dynamic interactive environment. Through numerical simulations, it is shown that cooperation can reach the maximal level in the moderate value space of fixed reward obtained by inactive agents. In particular, long-term expectations and large learning rates are more productive in promoting cooperation. Furthermore, when the dynamic interactive environment reaches a stable state, the number of active neighbors of active cooperators is larger than that of active defectors, which is further larger than the number of active neighbors of inactive agents. Finally, we testify the results of theoretical analysis from the perspective of state transition.
AB - Cooperative behavior is essential for conflicts between collective and individual benefits, and evolutionary game theory provides a key framework to solve this problem. Decision-making of human or automata agent occurs not only in a static environment, but also in the dynamic interactive environment. Since the reinforcement learning algorithm is well performed at explaining the problem in regard to state, action, and environment, we propose a game model with individual state transition which is influenced by the self-regarding Q-learning algorithm. In detail, we at the first time investigate a two-state two-action game, where agents can choose either to participate in the network game (i.e., active agent) or to cut off all the links based on its Q-table (i.e., inactive agent), involving in a dynamic interactive environment. Through numerical simulations, it is shown that cooperation can reach the maximal level in the moderate value space of fixed reward obtained by inactive agents. In particular, long-term expectations and large learning rates are more productive in promoting cooperation. Furthermore, when the dynamic interactive environment reaches a stable state, the number of active neighbors of active cooperators is larger than that of active defectors, which is further larger than the number of active neighbors of inactive agents. Finally, we testify the results of theoretical analysis from the perspective of state transition.
KW - Action selection
KW - Cooperation
KW - Q-learning
KW - Social dilemma
KW - State transition
UR - http://www.scopus.com/inward/record.url?scp=85138031325&partnerID=8YFLogxK
U2 - 10.1016/j.neucom.2022.08.023
DO - 10.1016/j.neucom.2022.08.023
M3 - 文章
AN - SCOPUS:85138031325
SN - 0925-2312
VL - 511
SP - 187
EP - 197
JO - Neurocomputing
JF - Neurocomputing
ER -