Learning controlled and targeted communication with the centralized critic for the multi-agent system

Qingshuang Sun, Yuan Yao, Peng Yi, Yu Jiao Hu, Zhao Yang, Gang Yang, Xingshe Zhou

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

Multi-agent deep reinforcement learning (MDRL) has attracted attention for solving complex tasks. Two main challenges of MDRL are non-stationarity and partial observability from the perspective of agents, impacting the performance of agents’ learning cooperative policies. In this study, Controlled and Targeted Communication with the Centralized Critic (COTAC) is proposed, thereby constructing the paradigm of centralized learning and decentralized execution with partial communication. It is capable of decoupling how the MAS obtains environmental information during training and execution. Specifically, COTAC can make the environment faced by agents to be stationarity in the training phase and learn partial communication to overcome the limitation of partial observability in the execution phase. Based on this, decentralized actors learn controlled and targeted communication and policies optimized by centralized critics during training. As a result, agents comprehensively learn when to communicate during the sending and how to target information aggregation during the receiving. Apart from that, COTAC is evaluated on two multi-agent scenarios with continuous space. Experimental results demonstrated that partial agents with important information choose to send messages and targeted aggregate received information by identifying the relevant important information, which can still have better cooperation performance while reducing the communication traffic of the system.

Original languageEnglish
Pages (from-to)14819-14837
Number of pages19
JournalApplied Intelligence
Volume53
Issue number12
DOIs
StatePublished - Jun 2023

Keywords

  • Centralized critic
  • Communication
  • Cooperation
  • Multi-agent system
  • Reinforcement learning

Fingerprint

Dive into the research topics of 'Learning controlled and targeted communication with the centralized critic for the multi-agent system'. Together they form a unique fingerprint.

Cite this