A Multitier reinforcement learning model for a cooperative multiagent system

Haobin Shi, Liangjing Zhai, Haibo Wu, Maxwell Hwang, Kao Shing Hwang, Hsuan Pei Hsu

科研成果: 期刊稿件文章同行评审

17 引用 (Scopus)

摘要

In multiagent cooperative systems with value-based reinforcement learning, agents learn how to complete the task by an optimal policy learned through value-policy improvement iterations. But how to design a policy that avoids cooperation dilemmas and comes to a common consensus between agents is an important issue. A method that improves the coordination ability of agents in cooperative systems by assessing the cooperative tendency and increases the collective payoff by candidate policy is proposed in this article. The method learns the cooperative rules by recording the cooperation probabilities for agents in a multitier reinforcement learning model. The candidate action sets are selected through the candidate policy which considers the payoff of the coalition. Then, the optimal strategy is selected through the Nash bargaining solution (NBS) from these candidate action sets. The method is tested using two cooperative tasks. The results show that the proposed algorithm, which addresses the instability and ambiguity in a win or learning fast policy hill-climbing (WoLF-PHC) and requires significantly less memory space than the NBS, is more stable and more efficient than other methods.

源语言英语
文章编号8976294
页(从-至)636-644
页数9
期刊IEEE Transactions on Cognitive and Developmental Systems
12
3
DOI
出版状态已出版 - 9月 2020

指纹

探究 'A Multitier reinforcement learning model for a cooperative multiagent system' 的科研主题。它们共同构成独一无二的指纹。

引用此