基于自注意力机制和策略映射重组的多智能体强化学习算法

Jing Chen Li, Hao Bin Shi, Kao Shing Hwang

科研成果: 期刊稿件文章同行评审

6 引用 (Scopus)

摘要

Multi-Agent Reinforcement Learning(MARL) has been widely applied in the group control field. Due to the Markov decision process for an agent is broken in MARL, the existing MARL methods are hard to learn optimal policies, and policies are unstable because the random behaviors of agents in MARL. From the viewpoint of the mapping between state spaces and behavior spaces, this work studies the coupling among agents in homogeneous MARL, aiming at enhancing the policy effectiveness and training stability. We first investigate the recombination of the joint behavior space for homogeneous agents, breaking the one-to-one correspondence between agents and policies. Then the abstract agents are proposed to transform the coupling among agents into that among the actions in the behavior space, by which the training efficiency and stabilization are improved. Based on the former, inspiring by sequential decisions, we design self-attention modules for the abstract agents' policy networks and evaluation networks respectively, encoding and thinning the states of agents. The learned policies can be explicitly explained through the self-attention module and the recombination. The proposed method is validated in three simulated MARL scenarios. The experimental results suggest that our method can outperform baseline methods in the case of centralized rewards, while the stability can be increased more than fifty percent by our method. Some ablation experiments are designed to validate the abstract agents and self-attention modules respectively, making our conclusion more convincing.

投稿的翻译标题A Multi-Agent Reinforcement Learning Method Based on Self-Attention Mechanism and Policy Mapping Recombination
源语言繁体中文
页(从-至)1842-1858
页数17
期刊Jisuanji Xuebao/Chinese Journal of Computers
45
9
DOI
出版状态已出版 - 9月 2022

关键词

  • Attention mechanism
  • Deep reinforcement learning
  • Multi-Agent reinforcement learning
  • Multi-Agent system

指纹

探究 '基于自注意力机制和策略映射重组的多智能体强化学习算法' 的科研主题。它们共同构成独一无二的指纹。

引用此