基于自注意力机制和策略映射重组的多智能体强化学习算法

Jing Chen Li; Hao Bin Shi; Kao Shing Hwang

doi:10.11897/SP.J.1016.2022.01842

基于自注意力机制和策略映射重组的多智能体强化学习算法

Jing Chen Li, Hao Bin Shi, Kao Shing Hwang

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

6 引用（Scopus）

摘要

Multi-Agent Reinforcement Learning(MARL) has been widely applied in the group control field. Due to the Markov decision process for an agent is broken in MARL, the existing MARL methods are hard to learn optimal policies, and policies are unstable because the random behaviors of agents in MARL. From the viewpoint of the mapping between state spaces and behavior spaces, this work studies the coupling among agents in homogeneous MARL, aiming at enhancing the policy effectiveness and training stability. We first investigate the recombination of the joint behavior space for homogeneous agents, breaking the one-to-one correspondence between agents and policies. Then the abstract agents are proposed to transform the coupling among agents into that among the actions in the behavior space, by which the training efficiency and stabilization are improved. Based on the former, inspiring by sequential decisions, we design self-attention modules for the abstract agents' policy networks and evaluation networks respectively, encoding and thinning the states of agents. The learned policies can be explicitly explained through the self-attention module and the recombination. The proposed method is validated in three simulated MARL scenarios. The experimental results suggest that our method can outperform baseline methods in the case of centralized rewards, while the stability can be increased more than fifty percent by our method. Some ablation experiments are designed to validate the abstract agents and self-attention modules respectively, making our conclusion more convincing.

投稿的翻译标题	A Multi-Agent Reinforcement Learning Method Based on Self-Attention Mechanism and Policy Mapping Recombination
源语言	繁体中文
页（从-至）	1842-1858
页数	17
期刊	Jisuanji Xuebao/Chinese Journal of Computers
卷	45
期	9
DOI	https://doi.org/10.11897/SP.J.1016.2022.01842
出版状态	已出版 - 9月 2022

关键词

Attention mechanism
Deep reinforcement learning
Multi-Agent reinforcement learning
Multi-Agent system

访问文件

10.11897/SP.J.1016.2022.01842

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{bfd205dc226446628051ac7ccc457316,

title = "基于自注意力机制和策略映射重组的多智能体强化学习算法",

abstract = "Multi-Agent Reinforcement Learning(MARL) has been widely applied in the group control field. Due to the Markov decision process for an agent is broken in MARL, the existing MARL methods are hard to learn optimal policies, and policies are unstable because the random behaviors of agents in MARL. From the viewpoint of the mapping between state spaces and behavior spaces, this work studies the coupling among agents in homogeneous MARL, aiming at enhancing the policy effectiveness and training stability. We first investigate the recombination of the joint behavior space for homogeneous agents, breaking the one-to-one correspondence between agents and policies. Then the abstract agents are proposed to transform the coupling among agents into that among the actions in the behavior space, by which the training efficiency and stabilization are improved. Based on the former, inspiring by sequential decisions, we design self-attention modules for the abstract agents' policy networks and evaluation networks respectively, encoding and thinning the states of agents. The learned policies can be explicitly explained through the self-attention module and the recombination. The proposed method is validated in three simulated MARL scenarios. The experimental results suggest that our method can outperform baseline methods in the case of centralized rewards, while the stability can be increased more than fifty percent by our method. Some ablation experiments are designed to validate the abstract agents and self-attention modules respectively, making our conclusion more convincing.",

keywords = "Attention mechanism, Deep reinforcement learning, Multi-Agent reinforcement learning, Multi-Agent system",

author = "Li, \{Jing Chen\} and Shi, \{Hao Bin\} and Hwang, \{Kao Shing\}",

year = "2022",

month = sep,

doi = "10.11897/SP.J.1016.2022.01842",

language = "繁体中文",

volume = "45",

pages = "1842--1858",

journal = "Jisuanji Xuebao/Chinese Journal of Computers",

issn = "0254-4164",

publisher = "Science Press ",

number = "9",

}

TY - JOUR

T1 - 基于自注意力机制和策略映射重组的多智能体强化学习算法

AU - Li, Jing Chen

AU - Shi, Hao Bin

AU - Hwang, Kao Shing

PY - 2022/9

Y1 - 2022/9

N2 - Multi-Agent Reinforcement Learning(MARL) has been widely applied in the group control field. Due to the Markov decision process for an agent is broken in MARL, the existing MARL methods are hard to learn optimal policies, and policies are unstable because the random behaviors of agents in MARL. From the viewpoint of the mapping between state spaces and behavior spaces, this work studies the coupling among agents in homogeneous MARL, aiming at enhancing the policy effectiveness and training stability. We first investigate the recombination of the joint behavior space for homogeneous agents, breaking the one-to-one correspondence between agents and policies. Then the abstract agents are proposed to transform the coupling among agents into that among the actions in the behavior space, by which the training efficiency and stabilization are improved. Based on the former, inspiring by sequential decisions, we design self-attention modules for the abstract agents' policy networks and evaluation networks respectively, encoding and thinning the states of agents. The learned policies can be explicitly explained through the self-attention module and the recombination. The proposed method is validated in three simulated MARL scenarios. The experimental results suggest that our method can outperform baseline methods in the case of centralized rewards, while the stability can be increased more than fifty percent by our method. Some ablation experiments are designed to validate the abstract agents and self-attention modules respectively, making our conclusion more convincing.

AB - Multi-Agent Reinforcement Learning(MARL) has been widely applied in the group control field. Due to the Markov decision process for an agent is broken in MARL, the existing MARL methods are hard to learn optimal policies, and policies are unstable because the random behaviors of agents in MARL. From the viewpoint of the mapping between state spaces and behavior spaces, this work studies the coupling among agents in homogeneous MARL, aiming at enhancing the policy effectiveness and training stability. We first investigate the recombination of the joint behavior space for homogeneous agents, breaking the one-to-one correspondence between agents and policies. Then the abstract agents are proposed to transform the coupling among agents into that among the actions in the behavior space, by which the training efficiency and stabilization are improved. Based on the former, inspiring by sequential decisions, we design self-attention modules for the abstract agents' policy networks and evaluation networks respectively, encoding and thinning the states of agents. The learned policies can be explicitly explained through the self-attention module and the recombination. The proposed method is validated in three simulated MARL scenarios. The experimental results suggest that our method can outperform baseline methods in the case of centralized rewards, while the stability can be increased more than fifty percent by our method. Some ablation experiments are designed to validate the abstract agents and self-attention modules respectively, making our conclusion more convincing.

KW - Attention mechanism

KW - Deep reinforcement learning

KW - Multi-Agent reinforcement learning

KW - Multi-Agent system

UR - http://www.scopus.com/inward/record.url?scp=85137169706&partnerID=8YFLogxK

U2 - 10.11897/SP.J.1016.2022.01842

DO - 10.11897/SP.J.1016.2022.01842

M3 - 文章

AN - SCOPUS:85137169706

SN - 0254-4164

VL - 45

SP - 1842

EP - 1858

JO - Jisuanji Xuebao/Chinese Journal of Computers

JF - Jisuanji Xuebao/Chinese Journal of Computers

IS - 9

ER -

基于自注意力机制和策略映射重组的多智能体强化学习算法

摘要

关键词

访问文件

其它文件与链接

指纹

引用此