TY - JOUR
T1 - Decentralized Transformers with Centralized Aggregation are Sample-Efficient Multi-Agent World Models
AU - Zhang, Yang
AU - Bai, Chenjia
AU - Zhao, Bin
AU - Yan, Junchi
AU - Li, Xiu
AU - Li, Xuelong
N1 - Publisher Copyright:
© 2025, Transactions on Machine Learning Research. All rights reserved.
PY - 2025/5
Y1 - 2025/5
N2 - Learning a world model for model-free Reinforcement Learning (RL) agents can significantly improve the sample efficiency by learning policies in imagination. However, building a world model for Multi-Agent RL (MARL) can be particularly challenging due to the scalability issue across different number of agents in a centralized architecture, and also the non-stationarity issue in a decentralized architecture stemming from the inter-dependency among agents. To address both challenges, we propose a novel world model for MARL that learns decentralized local dynamics for scalability, combined with a centralized representation aggregation from all agents. We cast the dynamics learning as an auto-regressive sequence modeling problem over discrete tokens by leveraging the expressive Transformer architecture, in order to model complex local dynamics across different agents and provide accurate and consistent long-term imaginations. As the first pioneering Transformer-based world model for multi-agent systems, we introduce a Perceiver Transformer as an effective solution to enable centralized representation aggregation within this context. Extensive results on Starcraft Multi-Agent Challenge (SMAC) and MAMujoco demonstrate superior sample efficiency and overall performance compared to strong model-free approaches and existing model-based methods.
AB - Learning a world model for model-free Reinforcement Learning (RL) agents can significantly improve the sample efficiency by learning policies in imagination. However, building a world model for Multi-Agent RL (MARL) can be particularly challenging due to the scalability issue across different number of agents in a centralized architecture, and also the non-stationarity issue in a decentralized architecture stemming from the inter-dependency among agents. To address both challenges, we propose a novel world model for MARL that learns decentralized local dynamics for scalability, combined with a centralized representation aggregation from all agents. We cast the dynamics learning as an auto-regressive sequence modeling problem over discrete tokens by leveraging the expressive Transformer architecture, in order to model complex local dynamics across different agents and provide accurate and consistent long-term imaginations. As the first pioneering Transformer-based world model for multi-agent systems, we introduce a Perceiver Transformer as an effective solution to enable centralized representation aggregation within this context. Extensive results on Starcraft Multi-Agent Challenge (SMAC) and MAMujoco demonstrate superior sample efficiency and overall performance compared to strong model-free approaches and existing model-based methods.
UR - http://www.scopus.com/inward/record.url?scp=105006757196&partnerID=8YFLogxK
M3 - 文章
AN - SCOPUS:105006757196
SN - 2835-8856
VL - 2025-May
JO - Transactions on Machine Learning Research
JF - Transactions on Machine Learning Research
ER -