TY - JOUR
T1 - Weighted Mean Field Q-Learning for Large Scale Multiagent Systems
AU - Chen, Zhuoying
AU - Li, Huiping
AU - Wang, Zhaoxu
AU - Yan, Bing
N1 - Publisher Copyright:
© 2005-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - Mean field reinforcement learning (MFRL) addresses the problem of dimensional explosion for largescale multiagent systems. However, MFRL averages the actions of neighbors equally while discarding the diversity and distinct features between individuals, which may lead to poor performance in many application scenarios. In this article, a new MFRL algorithm termed temporal weighted mean filed Q-learning (TWMFQ) is proposed. TWMFQ introduces a temporal compensated multihead attention structure to construct the weighted mean-field framework, which can sort out the complex relationships within the swarm into the interactions between specific agent and the weighted virtual mean agent. This approach allows the mean Q-function to represent the swarm behavior more informatively and comprehensively. In addition, an advanced sampling mechanism called mixed experience replay is established, which enriches the diversity of samples and prevents the algorithm from falling into local optimal solution. The comparison experiments on MAgent and multi-USV platform justify the superior performance of TWMFQ across different population sizes.
AB - Mean field reinforcement learning (MFRL) addresses the problem of dimensional explosion for largescale multiagent systems. However, MFRL averages the actions of neighbors equally while discarding the diversity and distinct features between individuals, which may lead to poor performance in many application scenarios. In this article, a new MFRL algorithm termed temporal weighted mean filed Q-learning (TWMFQ) is proposed. TWMFQ introduces a temporal compensated multihead attention structure to construct the weighted mean-field framework, which can sort out the complex relationships within the swarm into the interactions between specific agent and the weighted virtual mean agent. This approach allows the mean Q-function to represent the swarm behavior more informatively and comprehensively. In addition, an advanced sampling mechanism called mixed experience replay is established, which enriches the diversity of samples and prevents the algorithm from falling into local optimal solution. The comparison experiments on MAgent and multi-USV platform justify the superior performance of TWMFQ across different population sizes.
KW - Experience replay
KW - mean field reinforcement learning (MFRL)
KW - multi-unmanned surface vehicle (USV)
UR - http://www.scopus.com/inward/record.url?scp=105008668530&partnerID=8YFLogxK
U2 - 10.1109/TII.2025.3575139
DO - 10.1109/TII.2025.3575139
M3 - 文章
AN - SCOPUS:105008668530
SN - 1551-3203
JO - IEEE Transactions on Industrial Informatics
JF - IEEE Transactions on Industrial Informatics
ER -