Skill matters: Dynamic skill learning for multi-agent cooperative reinforcement learning

Tong Li; Chenjia Bai; Kang Xu; Chen Chu; Peican Zhu; Zhen Wang

doi:10.1016/j.neunet.2024.106852

Skill matters: Dynamic skill learning for multi-agent cooperative reinforcement learning

Tong Li, Chenjia Bai, Kang Xu, Chen Chu, Peican Zhu, Zhen Wang

School of Artificial Intelligence, OPtics and Electronics

Research output: Contribution to journal › Article › peer-review

1 Scopus citations

Abstract

With the popularization of intelligence, the necessity of cooperation between intelligent machines makes the research of collaborative multi-agent reinforcement learning (MARL) more extensive. Existing approaches typically address this challenge through task decomposition of the environment or role classification of agents. However, these studies may rely on the sharing of parameters between agents, resulting in the homogeneity of agent behavior, which is not effective for complex tasks. Or training that relies on external rewards is difficult to adapt to scenarios with sparse rewards. Based on the above challenges, in this paper we propose a novel dynamic skill learning (DSL) framework for agents to learn more diverse abilities motivated by internal rewards. Specifically, the DSL has two components: (i) Dynamic skill discovery, which encourages the production of meaningful skills by exploring the environment in an unsupervised manner, using the inner product between a skill vector and a trajectory representation to generate intrinsic rewards. Meanwhile, the Lipschitz constraint of the state representation function is used to ensure the proper trajectory of the learned skills. (ii) Dynamic skill assignment, which utilizes a policy controller to assign skills to each agent based on its different trajectory latent variables. In addition, in order to avoid training instability caused by frequent changes in skill selection, we introduce a regularization term to limit skill switching between adjacent time steps. We thoroughly tested the DSL approach on two challenging benchmarks, StarCraft II and Google Research Football. Experimental results show that compared with strong benchmarks such as QMIX and RODE, DSL effectively improves performance and is more adaptable to difficult collaborative scenarios.

Original language	English
Article number	106852
Journal	Neural Networks
Volume	181
DOIs	https://doi.org/10.1016/j.neunet.2024.106852
State	Published - Jan 2025

Keywords

Diverse behaviors
Multi-agent reinforcement learning
Skill assignment
Skill discovery

Access to Document

10.1016/j.neunet.2024.106852

Cite this

@article{ef8dd40cf9214d6b81aab9f11995e505,

title = "Skill matters: Dynamic skill learning for multi-agent cooperative reinforcement learning",

abstract = "With the popularization of intelligence, the necessity of cooperation between intelligent machines makes the research of collaborative multi-agent reinforcement learning (MARL) more extensive. Existing approaches typically address this challenge through task decomposition of the environment or role classification of agents. However, these studies may rely on the sharing of parameters between agents, resulting in the homogeneity of agent behavior, which is not effective for complex tasks. Or training that relies on external rewards is difficult to adapt to scenarios with sparse rewards. Based on the above challenges, in this paper we propose a novel dynamic skill learning (DSL) framework for agents to learn more diverse abilities motivated by internal rewards. Specifically, the DSL has two components: (i) Dynamic skill discovery, which encourages the production of meaningful skills by exploring the environment in an unsupervised manner, using the inner product between a skill vector and a trajectory representation to generate intrinsic rewards. Meanwhile, the Lipschitz constraint of the state representation function is used to ensure the proper trajectory of the learned skills. (ii) Dynamic skill assignment, which utilizes a policy controller to assign skills to each agent based on its different trajectory latent variables. In addition, in order to avoid training instability caused by frequent changes in skill selection, we introduce a regularization term to limit skill switching between adjacent time steps. We thoroughly tested the DSL approach on two challenging benchmarks, StarCraft II and Google Research Football. Experimental results show that compared with strong benchmarks such as QMIX and RODE, DSL effectively improves performance and is more adaptable to difficult collaborative scenarios.",

keywords = "Diverse behaviors, Multi-agent reinforcement learning, Skill assignment, Skill discovery",

author = "Tong Li and Chenjia Bai and Kang Xu and Chen Chu and Peican Zhu and Zhen Wang",

note = "Publisher Copyright: {\textcopyright} 2024 Elsevier Ltd",

year = "2025",

month = jan,

doi = "10.1016/j.neunet.2024.106852",

language = "英语",

volume = "181",

journal = "Neural Networks",

issn = "0893-6080",

publisher = "Elsevier Ltd",

}

TY - JOUR

T1 - Skill matters

T2 - Dynamic skill learning for multi-agent cooperative reinforcement learning

AU - Li, Tong

AU - Bai, Chenjia

AU - Xu, Kang

AU - Chu, Chen

AU - Zhu, Peican

AU - Wang, Zhen

PY - 2025/1

Y1 - 2025/1

N2 - With the popularization of intelligence, the necessity of cooperation between intelligent machines makes the research of collaborative multi-agent reinforcement learning (MARL) more extensive. Existing approaches typically address this challenge through task decomposition of the environment or role classification of agents. However, these studies may rely on the sharing of parameters between agents, resulting in the homogeneity of agent behavior, which is not effective for complex tasks. Or training that relies on external rewards is difficult to adapt to scenarios with sparse rewards. Based on the above challenges, in this paper we propose a novel dynamic skill learning (DSL) framework for agents to learn more diverse abilities motivated by internal rewards. Specifically, the DSL has two components: (i) Dynamic skill discovery, which encourages the production of meaningful skills by exploring the environment in an unsupervised manner, using the inner product between a skill vector and a trajectory representation to generate intrinsic rewards. Meanwhile, the Lipschitz constraint of the state representation function is used to ensure the proper trajectory of the learned skills. (ii) Dynamic skill assignment, which utilizes a policy controller to assign skills to each agent based on its different trajectory latent variables. In addition, in order to avoid training instability caused by frequent changes in skill selection, we introduce a regularization term to limit skill switching between adjacent time steps. We thoroughly tested the DSL approach on two challenging benchmarks, StarCraft II and Google Research Football. Experimental results show that compared with strong benchmarks such as QMIX and RODE, DSL effectively improves performance and is more adaptable to difficult collaborative scenarios.

AB - With the popularization of intelligence, the necessity of cooperation between intelligent machines makes the research of collaborative multi-agent reinforcement learning (MARL) more extensive. Existing approaches typically address this challenge through task decomposition of the environment or role classification of agents. However, these studies may rely on the sharing of parameters between agents, resulting in the homogeneity of agent behavior, which is not effective for complex tasks. Or training that relies on external rewards is difficult to adapt to scenarios with sparse rewards. Based on the above challenges, in this paper we propose a novel dynamic skill learning (DSL) framework for agents to learn more diverse abilities motivated by internal rewards. Specifically, the DSL has two components: (i) Dynamic skill discovery, which encourages the production of meaningful skills by exploring the environment in an unsupervised manner, using the inner product between a skill vector and a trajectory representation to generate intrinsic rewards. Meanwhile, the Lipschitz constraint of the state representation function is used to ensure the proper trajectory of the learned skills. (ii) Dynamic skill assignment, which utilizes a policy controller to assign skills to each agent based on its different trajectory latent variables. In addition, in order to avoid training instability caused by frequent changes in skill selection, we introduce a regularization term to limit skill switching between adjacent time steps. We thoroughly tested the DSL approach on two challenging benchmarks, StarCraft II and Google Research Football. Experimental results show that compared with strong benchmarks such as QMIX and RODE, DSL effectively improves performance and is more adaptable to difficult collaborative scenarios.

KW - Diverse behaviors

KW - Multi-agent reinforcement learning

KW - Skill assignment

KW - Skill discovery

UR - http://www.scopus.com/inward/record.url?scp=85208507131&partnerID=8YFLogxK

U2 - 10.1016/j.neunet.2024.106852

DO - 10.1016/j.neunet.2024.106852

M3 - 文章

C2 - 39522419

AN - SCOPUS:85208507131

SN - 0893-6080

VL - 181

JO - Neural Networks

JF - Neural Networks

M1 - 106852

ER -

Skill matters: Dynamic skill learning for multi-agent cooperative reinforcement learning

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this