Weighted Mean Field Q-Learning for Large Scale Multiagent Systems

Zhuoying Chen; Huiping Li; Zhaoxu Wang; Bing Yan

doi:10.1109/TII.2025.3575139

Weighted Mean Field Q-Learning for Large Scale Multiagent Systems

Zhuoying Chen, Huiping Li, Zhaoxu Wang, Bing Yan

School of Marine Science and Technology

Research output: Contribution to journal › Article › peer-review

Abstract

Mean field reinforcement learning (MFRL) addresses the problem of dimensional explosion for largescale multiagent systems. However, MFRL averages the actions of neighbors equally while discarding the diversity and distinct features between individuals, which may lead to poor performance in many application scenarios. In this article, a new MFRL algorithm termed temporal weighted mean filed Q-learning (TWMFQ) is proposed. TWMFQ introduces a temporal compensated multihead attention structure to construct the weighted mean-field framework, which can sort out the complex relationships within the swarm into the interactions between specific agent and the weighted virtual mean agent. This approach allows the mean Q-function to represent the swarm behavior more informatively and comprehensively. In addition, an advanced sampling mechanism called mixed experience replay is established, which enriches the diversity of samples and prevents the algorithm from falling into local optimal solution. The comparison experiments on MAgent and multi-USV platform justify the superior performance of TWMFQ across different population sizes.

Original language	English
Journal	IEEE Transactions on Industrial Informatics
DOIs	https://doi.org/10.1109/TII.2025.3575139
State	Accepted/In press - 2025

Keywords

Experience replay
mean field reinforcement learning (MFRL)
multi-unmanned surface vehicle (USV)

Access to Document

10.1109/TII.2025.3575139

Cite this

@article{e7643b633130473d9e1e17b9ca2bcbe3,

title = "Weighted Mean Field Q-Learning for Large Scale Multiagent Systems",

abstract = "Mean field reinforcement learning (MFRL) addresses the problem of dimensional explosion for largescale multiagent systems. However, MFRL averages the actions of neighbors equally while discarding the diversity and distinct features between individuals, which may lead to poor performance in many application scenarios. In this article, a new MFRL algorithm termed temporal weighted mean filed Q-learning (TWMFQ) is proposed. TWMFQ introduces a temporal compensated multihead attention structure to construct the weighted mean-field framework, which can sort out the complex relationships within the swarm into the interactions between specific agent and the weighted virtual mean agent. This approach allows the mean Q-function to represent the swarm behavior more informatively and comprehensively. In addition, an advanced sampling mechanism called mixed experience replay is established, which enriches the diversity of samples and prevents the algorithm from falling into local optimal solution. The comparison experiments on MAgent and multi-USV platform justify the superior performance of TWMFQ across different population sizes.",

keywords = "Experience replay, mean field reinforcement learning (MFRL), multi-unmanned surface vehicle (USV)",

author = "Zhuoying Chen and Huiping Li and Zhaoxu Wang and Bing Yan",

note = "Publisher Copyright: {\textcopyright} 2005-2012 IEEE.",

year = "2025",

doi = "10.1109/TII.2025.3575139",

language = "英语",

journal = "IEEE Transactions on Industrial Informatics",

issn = "1551-3203",

publisher = "IEEE Computer Society",

}

TY - JOUR

T1 - Weighted Mean Field Q-Learning for Large Scale Multiagent Systems

AU - Chen, Zhuoying

AU - Li, Huiping

AU - Wang, Zhaoxu

AU - Yan, Bing

PY - 2025

Y1 - 2025

N2 - Mean field reinforcement learning (MFRL) addresses the problem of dimensional explosion for largescale multiagent systems. However, MFRL averages the actions of neighbors equally while discarding the diversity and distinct features between individuals, which may lead to poor performance in many application scenarios. In this article, a new MFRL algorithm termed temporal weighted mean filed Q-learning (TWMFQ) is proposed. TWMFQ introduces a temporal compensated multihead attention structure to construct the weighted mean-field framework, which can sort out the complex relationships within the swarm into the interactions between specific agent and the weighted virtual mean agent. This approach allows the mean Q-function to represent the swarm behavior more informatively and comprehensively. In addition, an advanced sampling mechanism called mixed experience replay is established, which enriches the diversity of samples and prevents the algorithm from falling into local optimal solution. The comparison experiments on MAgent and multi-USV platform justify the superior performance of TWMFQ across different population sizes.

AB - Mean field reinforcement learning (MFRL) addresses the problem of dimensional explosion for largescale multiagent systems. However, MFRL averages the actions of neighbors equally while discarding the diversity and distinct features between individuals, which may lead to poor performance in many application scenarios. In this article, a new MFRL algorithm termed temporal weighted mean filed Q-learning (TWMFQ) is proposed. TWMFQ introduces a temporal compensated multihead attention structure to construct the weighted mean-field framework, which can sort out the complex relationships within the swarm into the interactions between specific agent and the weighted virtual mean agent. This approach allows the mean Q-function to represent the swarm behavior more informatively and comprehensively. In addition, an advanced sampling mechanism called mixed experience replay is established, which enriches the diversity of samples and prevents the algorithm from falling into local optimal solution. The comparison experiments on MAgent and multi-USV platform justify the superior performance of TWMFQ across different population sizes.

KW - Experience replay

KW - mean field reinforcement learning (MFRL)

KW - multi-unmanned surface vehicle (USV)

UR - http://www.scopus.com/inward/record.url?scp=105008668530&partnerID=8YFLogxK

U2 - 10.1109/TII.2025.3575139

DO - 10.1109/TII.2025.3575139

M3 - 文章

AN - SCOPUS:105008668530

SN - 1551-3203

JO - IEEE Transactions on Industrial Informatics

JF - IEEE Transactions on Industrial Informatics

ER -

Weighted Mean Field Q-Learning for Large Scale Multiagent Systems

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this