A Multitier reinforcement learning model for a cooperative multiagent system

Haobin Shi; Liangjing Zhai; Haibo Wu; Maxwell Hwang; Kao Shing Hwang; Hsuan Pei Hsu

doi:10.1109/TCDS.2020.2970487

A Multitier reinforcement learning model for a cooperative multiagent system

Haobin Shi, Liangjing Zhai, Haibo Wu, Maxwell Hwang, Kao Shing Hwang, Hsuan Pei Hsu

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

17 引用（Scopus）

摘要

In multiagent cooperative systems with value-based reinforcement learning, agents learn how to complete the task by an optimal policy learned through value-policy improvement iterations. But how to design a policy that avoids cooperation dilemmas and comes to a common consensus between agents is an important issue. A method that improves the coordination ability of agents in cooperative systems by assessing the cooperative tendency and increases the collective payoff by candidate policy is proposed in this article. The method learns the cooperative rules by recording the cooperation probabilities for agents in a multitier reinforcement learning model. The candidate action sets are selected through the candidate policy which considers the payoff of the coalition. Then, the optimal strategy is selected through the Nash bargaining solution (NBS) from these candidate action sets. The method is tested using two cooperative tasks. The results show that the proposed algorithm, which addresses the instability and ambiguity in a win or learning fast policy hill-climbing (WoLF-PHC) and requires significantly less memory space than the NBS, is more stable and more efficient than other methods.

源语言	英语
文章编号	8976294
页（从-至）	636-644
页数	9
期刊	IEEE Transactions on Cognitive and Developmental Systems
卷	12
期	3
DOI	https://doi.org/10.1109/TCDS.2020.2970487
出版状态	已出版 - 9月 2020

访问文件

10.1109/TCDS.2020.2970487

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{adac389d703e49689cd1e9c13ebfd2b9,

title = "A Multitier reinforcement learning model for a cooperative multiagent system",

abstract = "In multiagent cooperative systems with value-based reinforcement learning, agents learn how to complete the task by an optimal policy learned through value-policy improvement iterations. But how to design a policy that avoids cooperation dilemmas and comes to a common consensus between agents is an important issue. A method that improves the coordination ability of agents in cooperative systems by assessing the cooperative tendency and increases the collective payoff by candidate policy is proposed in this article. The method learns the cooperative rules by recording the cooperation probabilities for agents in a multitier reinforcement learning model. The candidate action sets are selected through the candidate policy which considers the payoff of the coalition. Then, the optimal strategy is selected through the Nash bargaining solution (NBS) from these candidate action sets. The method is tested using two cooperative tasks. The results show that the proposed algorithm, which addresses the instability and ambiguity in a win or learning fast policy hill-climbing (WoLF-PHC) and requires significantly less memory space than the NBS, is more stable and more efficient than other methods.",

keywords = "Cooperation game, dilemma, multiagent systems, Nash bargaining solution (NBS), Q-learning, reinforcement learning",

author = "Haobin Shi and Liangjing Zhai and Haibo Wu and Maxwell Hwang and Hwang, {Kao Shing} and Hsu, {Hsuan Pei}",

note = "Publisher Copyright: {\textcopyright} 2016 IEEE.",

year = "2020",

month = sep,

doi = "10.1109/TCDS.2020.2970487",

language = "英语",

volume = "12",

pages = "636--644",

journal = "IEEE Transactions on Cognitive and Developmental Systems",

issn = "2379-8920",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "3",

}

TY - JOUR

T1 - A Multitier reinforcement learning model for a cooperative multiagent system

AU - Shi, Haobin

AU - Zhai, Liangjing

AU - Wu, Haibo

AU - Hwang, Maxwell

AU - Hwang, Kao Shing

AU - Hsu, Hsuan Pei

PY - 2020/9

Y1 - 2020/9

N2 - In multiagent cooperative systems with value-based reinforcement learning, agents learn how to complete the task by an optimal policy learned through value-policy improvement iterations. But how to design a policy that avoids cooperation dilemmas and comes to a common consensus between agents is an important issue. A method that improves the coordination ability of agents in cooperative systems by assessing the cooperative tendency and increases the collective payoff by candidate policy is proposed in this article. The method learns the cooperative rules by recording the cooperation probabilities for agents in a multitier reinforcement learning model. The candidate action sets are selected through the candidate policy which considers the payoff of the coalition. Then, the optimal strategy is selected through the Nash bargaining solution (NBS) from these candidate action sets. The method is tested using two cooperative tasks. The results show that the proposed algorithm, which addresses the instability and ambiguity in a win or learning fast policy hill-climbing (WoLF-PHC) and requires significantly less memory space than the NBS, is more stable and more efficient than other methods.

AB - In multiagent cooperative systems with value-based reinforcement learning, agents learn how to complete the task by an optimal policy learned through value-policy improvement iterations. But how to design a policy that avoids cooperation dilemmas and comes to a common consensus between agents is an important issue. A method that improves the coordination ability of agents in cooperative systems by assessing the cooperative tendency and increases the collective payoff by candidate policy is proposed in this article. The method learns the cooperative rules by recording the cooperation probabilities for agents in a multitier reinforcement learning model. The candidate action sets are selected through the candidate policy which considers the payoff of the coalition. Then, the optimal strategy is selected through the Nash bargaining solution (NBS) from these candidate action sets. The method is tested using two cooperative tasks. The results show that the proposed algorithm, which addresses the instability and ambiguity in a win or learning fast policy hill-climbing (WoLF-PHC) and requires significantly less memory space than the NBS, is more stable and more efficient than other methods.

KW - Cooperation game

KW - dilemma

KW - multiagent systems

KW - Nash bargaining solution (NBS)

KW - Q-learning

KW - reinforcement learning

UR - http://www.scopus.com/inward/record.url?scp=85087869565&partnerID=8YFLogxK

U2 - 10.1109/TCDS.2020.2970487

DO - 10.1109/TCDS.2020.2970487

M3 - 文章

AN - SCOPUS:85087869565

SN - 2379-8920

VL - 12

SP - 636

EP - 644

JO - IEEE Transactions on Cognitive and Developmental Systems

JF - IEEE Transactions on Cognitive and Developmental Systems

IS - 3

M1 - 8976294

ER -

A Multitier reinforcement learning model for a cooperative multiagent system

摘要

访问文件

其它文件与链接

指纹

引用此