Learning controlled and targeted communication with the centralized critic for the multi-agent system

Qingshuang Sun; Yuan Yao; Peng Yi; Yu Jiao Hu; Zhao Yang; Gang Yang; Xingshe Zhou

doi:10.1007/s10489-022-04225-5

Learning controlled and targeted communication with the centralized critic for the multi-agent system

Qingshuang Sun, Yuan Yao, Peng Yi, Yu Jiao Hu, Zhao Yang, Gang Yang, Xingshe Zhou

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

4 引用（Scopus）

摘要

Multi-agent deep reinforcement learning (MDRL) has attracted attention for solving complex tasks. Two main challenges of MDRL are non-stationarity and partial observability from the perspective of agents, impacting the performance of agents’ learning cooperative policies. In this study, Controlled and Targeted Communication with the Centralized Critic (COTAC) is proposed, thereby constructing the paradigm of centralized learning and decentralized execution with partial communication. It is capable of decoupling how the MAS obtains environmental information during training and execution. Specifically, COTAC can make the environment faced by agents to be stationarity in the training phase and learn partial communication to overcome the limitation of partial observability in the execution phase. Based on this, decentralized actors learn controlled and targeted communication and policies optimized by centralized critics during training. As a result, agents comprehensively learn when to communicate during the sending and how to target information aggregation during the receiving. Apart from that, COTAC is evaluated on two multi-agent scenarios with continuous space. Experimental results demonstrated that partial agents with important information choose to send messages and targeted aggregate received information by identifying the relevant important information, which can still have better cooperation performance while reducing the communication traffic of the system.

源语言	英语
页（从-至）	14819-14837
页数	19
期刊	Applied Intelligence
卷	53
期	12
DOI	https://doi.org/10.1007/s10489-022-04225-5
出版状态	已出版 - 6月 2023

访问文件

10.1007/s10489-022-04225-5

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{01320a69452649f1b55d699d66acbdb3,

title = "Learning controlled and targeted communication with the centralized critic for the multi-agent system",

abstract = "Multi-agent deep reinforcement learning (MDRL) has attracted attention for solving complex tasks. Two main challenges of MDRL are non-stationarity and partial observability from the perspective of agents, impacting the performance of agents{\textquoteright} learning cooperative policies. In this study, Controlled and Targeted Communication with the Centralized Critic (COTAC) is proposed, thereby constructing the paradigm of centralized learning and decentralized execution with partial communication. It is capable of decoupling how the MAS obtains environmental information during training and execution. Specifically, COTAC can make the environment faced by agents to be stationarity in the training phase and learn partial communication to overcome the limitation of partial observability in the execution phase. Based on this, decentralized actors learn controlled and targeted communication and policies optimized by centralized critics during training. As a result, agents comprehensively learn when to communicate during the sending and how to target information aggregation during the receiving. Apart from that, COTAC is evaluated on two multi-agent scenarios with continuous space. Experimental results demonstrated that partial agents with important information choose to send messages and targeted aggregate received information by identifying the relevant important information, which can still have better cooperation performance while reducing the communication traffic of the system.",

keywords = "Centralized critic, Communication, Cooperation, Multi-agent system, Reinforcement learning",

author = "Qingshuang Sun and Yuan Yao and Peng Yi and Hu, {Yu Jiao} and Zhao Yang and Gang Yang and Xingshe Zhou",

note = "Publisher Copyright: {\textcopyright} 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.",

year = "2023",

month = jun,

doi = "10.1007/s10489-022-04225-5",

language = "英语",

volume = "53",

pages = "14819--14837",

journal = "Applied Intelligence",

issn = "0924-669X",

publisher = "Springer Netherlands",

number = "12",

}

TY - JOUR

T1 - Learning controlled and targeted communication with the centralized critic for the multi-agent system

AU - Sun, Qingshuang

AU - Yao, Yuan

AU - Yi, Peng

AU - Hu, Yu Jiao

AU - Yang, Zhao

AU - Yang, Gang

AU - Zhou, Xingshe

PY - 2023/6

Y1 - 2023/6

N2 - Multi-agent deep reinforcement learning (MDRL) has attracted attention for solving complex tasks. Two main challenges of MDRL are non-stationarity and partial observability from the perspective of agents, impacting the performance of agents’ learning cooperative policies. In this study, Controlled and Targeted Communication with the Centralized Critic (COTAC) is proposed, thereby constructing the paradigm of centralized learning and decentralized execution with partial communication. It is capable of decoupling how the MAS obtains environmental information during training and execution. Specifically, COTAC can make the environment faced by agents to be stationarity in the training phase and learn partial communication to overcome the limitation of partial observability in the execution phase. Based on this, decentralized actors learn controlled and targeted communication and policies optimized by centralized critics during training. As a result, agents comprehensively learn when to communicate during the sending and how to target information aggregation during the receiving. Apart from that, COTAC is evaluated on two multi-agent scenarios with continuous space. Experimental results demonstrated that partial agents with important information choose to send messages and targeted aggregate received information by identifying the relevant important information, which can still have better cooperation performance while reducing the communication traffic of the system.

AB - Multi-agent deep reinforcement learning (MDRL) has attracted attention for solving complex tasks. Two main challenges of MDRL are non-stationarity and partial observability from the perspective of agents, impacting the performance of agents’ learning cooperative policies. In this study, Controlled and Targeted Communication with the Centralized Critic (COTAC) is proposed, thereby constructing the paradigm of centralized learning and decentralized execution with partial communication. It is capable of decoupling how the MAS obtains environmental information during training and execution. Specifically, COTAC can make the environment faced by agents to be stationarity in the training phase and learn partial communication to overcome the limitation of partial observability in the execution phase. Based on this, decentralized actors learn controlled and targeted communication and policies optimized by centralized critics during training. As a result, agents comprehensively learn when to communicate during the sending and how to target information aggregation during the receiving. Apart from that, COTAC is evaluated on two multi-agent scenarios with continuous space. Experimental results demonstrated that partial agents with important information choose to send messages and targeted aggregate received information by identifying the relevant important information, which can still have better cooperation performance while reducing the communication traffic of the system.

KW - Centralized critic

KW - Communication

KW - Cooperation

KW - Multi-agent system

KW - Reinforcement learning

UR - http://www.scopus.com/inward/record.url?scp=85141348373&partnerID=8YFLogxK

U2 - 10.1007/s10489-022-04225-5

DO - 10.1007/s10489-022-04225-5

M3 - 文章

AN - SCOPUS:85141348373

SN - 0924-669X

VL - 53

SP - 14819

EP - 14837

JO - Applied Intelligence

JF - Applied Intelligence

IS - 12

ER -

Learning controlled and targeted communication with the centralized critic for the multi-agent system

摘要

访问文件

其它文件与链接

指纹

引用此