Multi-agent reinforcement learning by the actor-critic model with an attention interface

Lixiang Zhang, Jingchen Li, Yi'an Zhu, Haobin Shi, Kao Shing Hwang

Research output: Contribution to journalArticlepeer-review

20 Scopus citations

Abstract

Multi-agent reinforcement learning algorithms have achieved satisfactory performances in various scenarios, but many of them encounter difficulties in partially observable environments. In partially observable environments, the inability to perceive environment states results in unsteadiness and misconvergence, especially in large-scale multi-agent environments. To improve interactions among homogeneous agents in a partially observable environment, we propose a novel multi-agent actor-critic model with a visual attention interface to solve this problem. First, a recurrent visual attention interface is used to extract a latent state from each agent's partial observation. These latent states allow agents to focus on several local environments, in which each agent has a complete perception of a local environment and the intricate multi-agent environment is teased out by the interaction among several agents in the same local environment. The proposed method trains multi-agent systems with a centralized training and decentralized execution mechanism. The joint action of agents is approximated by the mean-field theory because the number of agents in a local environment is uncertain. Experimental results on the simulation platform suggest that our model performs better when training large-scale multi-agent systems in partially observable environments than baselines.

Original languageEnglish
Pages (from-to)275-284
Number of pages10
JournalNeurocomputing
Volume471
DOIs
StatePublished - 30 Jan 2022

Keywords

  • Actor-critic
  • Attention mechanism
  • Mean-field theory
  • Multi-agent reinforcement learning
  • Multi-agent system

Fingerprint

Dive into the research topics of 'Multi-agent reinforcement learning by the actor-critic model with an attention interface'. Together they form a unique fingerprint.

Cite this