A Multitier reinforcement learning model for a cooperative multiagent system

Haobin Shi; Liangjing Zhai; Haibo Wu; Maxwell Hwang; Kao Shing Hwang; Hsuan Pei Hsu

doi:10.1109/TCDS.2020.2970487

A Multitier reinforcement learning model for a cooperative multiagent system

Haobin Shi, Liangjing Zhai, Haibo Wu, Maxwell Hwang, Kao Shing Hwang, Hsuan Pei Hsu

School of Computer Science

Research output: Contribution to journal › Article › peer-review

17 Scopus citations

Abstract

In multiagent cooperative systems with value-based reinforcement learning, agents learn how to complete the task by an optimal policy learned through value-policy improvement iterations. But how to design a policy that avoids cooperation dilemmas and comes to a common consensus between agents is an important issue. A method that improves the coordination ability of agents in cooperative systems by assessing the cooperative tendency and increases the collective payoff by candidate policy is proposed in this article. The method learns the cooperative rules by recording the cooperation probabilities for agents in a multitier reinforcement learning model. The candidate action sets are selected through the candidate policy which considers the payoff of the coalition. Then, the optimal strategy is selected through the Nash bargaining solution (NBS) from these candidate action sets. The method is tested using two cooperative tasks. The results show that the proposed algorithm, which addresses the instability and ambiguity in a win or learning fast policy hill-climbing (WoLF-PHC) and requires significantly less memory space than the NBS, is more stable and more efficient than other methods.

Original language	English
Article number	8976294
Pages (from-to)	636-644
Number of pages	9
Journal	IEEE Transactions on Cognitive and Developmental Systems
Volume	12
Issue number	3
DOIs	https://doi.org/10.1109/TCDS.2020.2970487
State	Published - Sep 2020

Keywords

Cooperation game
dilemma
multiagent systems
Nash bargaining solution (NBS)
Q-learning
reinforcement learning

Access to Document

10.1109/TCDS.2020.2970487

Cite this

@article{adac389d703e49689cd1e9c13ebfd2b9,

title = "A Multitier reinforcement learning model for a cooperative multiagent system",

abstract = "In multiagent cooperative systems with value-based reinforcement learning, agents learn how to complete the task by an optimal policy learned through value-policy improvement iterations. But how to design a policy that avoids cooperation dilemmas and comes to a common consensus between agents is an important issue. A method that improves the coordination ability of agents in cooperative systems by assessing the cooperative tendency and increases the collective payoff by candidate policy is proposed in this article. The method learns the cooperative rules by recording the cooperation probabilities for agents in a multitier reinforcement learning model. The candidate action sets are selected through the candidate policy which considers the payoff of the coalition. Then, the optimal strategy is selected through the Nash bargaining solution (NBS) from these candidate action sets. The method is tested using two cooperative tasks. The results show that the proposed algorithm, which addresses the instability and ambiguity in a win or learning fast policy hill-climbing (WoLF-PHC) and requires significantly less memory space than the NBS, is more stable and more efficient than other methods.",

keywords = "Cooperation game, dilemma, multiagent systems, Nash bargaining solution (NBS), Q-learning, reinforcement learning",

author = "Haobin Shi and Liangjing Zhai and Haibo Wu and Maxwell Hwang and Hwang, {Kao Shing} and Hsu, {Hsuan Pei}",

note = "Publisher Copyright: {\textcopyright} 2016 IEEE.",

year = "2020",

month = sep,

doi = "10.1109/TCDS.2020.2970487",

language = "英语",

volume = "12",

pages = "636--644",

journal = "IEEE Transactions on Cognitive and Developmental Systems",

issn = "2379-8920",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "3",

}

TY - JOUR

T1 - A Multitier reinforcement learning model for a cooperative multiagent system

AU - Shi, Haobin

AU - Zhai, Liangjing

AU - Wu, Haibo

AU - Hwang, Maxwell

AU - Hwang, Kao Shing

AU - Hsu, Hsuan Pei

PY - 2020/9

Y1 - 2020/9

N2 - In multiagent cooperative systems with value-based reinforcement learning, agents learn how to complete the task by an optimal policy learned through value-policy improvement iterations. But how to design a policy that avoids cooperation dilemmas and comes to a common consensus between agents is an important issue. A method that improves the coordination ability of agents in cooperative systems by assessing the cooperative tendency and increases the collective payoff by candidate policy is proposed in this article. The method learns the cooperative rules by recording the cooperation probabilities for agents in a multitier reinforcement learning model. The candidate action sets are selected through the candidate policy which considers the payoff of the coalition. Then, the optimal strategy is selected through the Nash bargaining solution (NBS) from these candidate action sets. The method is tested using two cooperative tasks. The results show that the proposed algorithm, which addresses the instability and ambiguity in a win or learning fast policy hill-climbing (WoLF-PHC) and requires significantly less memory space than the NBS, is more stable and more efficient than other methods.

AB - In multiagent cooperative systems with value-based reinforcement learning, agents learn how to complete the task by an optimal policy learned through value-policy improvement iterations. But how to design a policy that avoids cooperation dilemmas and comes to a common consensus between agents is an important issue. A method that improves the coordination ability of agents in cooperative systems by assessing the cooperative tendency and increases the collective payoff by candidate policy is proposed in this article. The method learns the cooperative rules by recording the cooperation probabilities for agents in a multitier reinforcement learning model. The candidate action sets are selected through the candidate policy which considers the payoff of the coalition. Then, the optimal strategy is selected through the Nash bargaining solution (NBS) from these candidate action sets. The method is tested using two cooperative tasks. The results show that the proposed algorithm, which addresses the instability and ambiguity in a win or learning fast policy hill-climbing (WoLF-PHC) and requires significantly less memory space than the NBS, is more stable and more efficient than other methods.

KW - Cooperation game

KW - dilemma

KW - multiagent systems

KW - Nash bargaining solution (NBS)

KW - Q-learning

KW - reinforcement learning

UR - http://www.scopus.com/inward/record.url?scp=85087869565&partnerID=8YFLogxK

U2 - 10.1109/TCDS.2020.2970487

DO - 10.1109/TCDS.2020.2970487

M3 - 文章

AN - SCOPUS:85087869565

SN - 2379-8920

VL - 12

SP - 636

EP - 644

JO - IEEE Transactions on Cognitive and Developmental Systems

JF - IEEE Transactions on Cognitive and Developmental Systems

IS - 3

M1 - 8976294

ER -

A Multitier reinforcement learning model for a cooperative multiagent system

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this