Abstract
In multiagent cooperative systems with value-based reinforcement learning, agents learn how to complete the task by an optimal policy learned through value-policy improvement iterations. But how to design a policy that avoids cooperation dilemmas and comes to a common consensus between agents is an important issue. A method that improves the coordination ability of agents in cooperative systems by assessing the cooperative tendency and increases the collective payoff by candidate policy is proposed in this article. The method learns the cooperative rules by recording the cooperation probabilities for agents in a multitier reinforcement learning model. The candidate action sets are selected through the candidate policy which considers the payoff of the coalition. Then, the optimal strategy is selected through the Nash bargaining solution (NBS) from these candidate action sets. The method is tested using two cooperative tasks. The results show that the proposed algorithm, which addresses the instability and ambiguity in a win or learning fast policy hill-climbing (WoLF-PHC) and requires significantly less memory space than the NBS, is more stable and more efficient than other methods.
Original language | English |
---|---|
Article number | 8976294 |
Pages (from-to) | 636-644 |
Number of pages | 9 |
Journal | IEEE Transactions on Cognitive and Developmental Systems |
Volume | 12 |
Issue number | 3 |
DOIs | |
State | Published - Sep 2020 |
Keywords
- Cooperation game
- dilemma
- multiagent systems
- Nash bargaining solution (NBS)
- Q-learning
- reinforcement learning