TY - JOUR
T1 - PMIC
T2 - 39th International Conference on Machine Learning, ICML 2022
AU - Li, Pengyi
AU - Tang, Hongyao
AU - Yang, Tianpei
AU - Hao, Xiaotian
AU - Sang, Tong
AU - Zheng, Yan
AU - Hao, Jianye
AU - Taylor, Matthew E.
AU - Tao, Wenyuan
AU - Wang, Zhen
N1 - Publisher Copyright:
Copyright © 2022 by the author(s)
PY - 2022
Y1 - 2022
N2 - Learning to collaborate is critical in MultiAgent Reinforcement Learning (MARL). Previous works promote collaboration by maximizing the correlation of agents' behaviors, which is typically characterized by Mutual Information (MI) in different forms. However, we reveal sub-optimal collaborative behaviors also emerge with strong correlations, and simply maximizing the MI can, surprisingly, hinder the learning towards better collaboration. To address this issue, we propose a novel MARL framework, called Progressive Mutual Information Collaboration (PMIC), for more effective MI-driven collaboration. PMIC uses a new collaboration criterion measured by the MI between global states and joint actions. Based on this criterion, the key idea of PMIC is maximizing the MI associated with superior collaborative behaviors and minimizing the MI associated with inferior ones. The two MI objectives play complementary roles by facilitating better collaborations while avoiding falling into sub-optimal ones. Experiments on a wide range of MARL benchmarks show the superior performance of PMIC compared with other algorithms.
AB - Learning to collaborate is critical in MultiAgent Reinforcement Learning (MARL). Previous works promote collaboration by maximizing the correlation of agents' behaviors, which is typically characterized by Mutual Information (MI) in different forms. However, we reveal sub-optimal collaborative behaviors also emerge with strong correlations, and simply maximizing the MI can, surprisingly, hinder the learning towards better collaboration. To address this issue, we propose a novel MARL framework, called Progressive Mutual Information Collaboration (PMIC), for more effective MI-driven collaboration. PMIC uses a new collaboration criterion measured by the MI between global states and joint actions. Based on this criterion, the key idea of PMIC is maximizing the MI associated with superior collaborative behaviors and minimizing the MI associated with inferior ones. The two MI objectives play complementary roles by facilitating better collaborations while avoiding falling into sub-optimal ones. Experiments on a wide range of MARL benchmarks show the superior performance of PMIC compared with other algorithms.
UR - http://www.scopus.com/inward/record.url?scp=85149662646&partnerID=8YFLogxK
M3 - 会议文章
AN - SCOPUS:85149662646
SN - 2640-3498
VL - 162
SP - 12979
EP - 12997
JO - Proceedings of Machine Learning Research
JF - Proceedings of Machine Learning Research
Y2 - 17 July 2022 through 23 July 2022
ER -