Decentralized Transformers with Centralized Aggregation are Sample-Efficient Multi-Agent World Models

Yang Zhang; Chenjia Bai; Bin Zhao; Junchi Yan; Xiu Li; Xuelong Li

Decentralized Transformers with Centralized Aggregation are Sample-Efficient Multi-Agent World Models

Yang Zhang, Chenjia Bai, Bin Zhao, Junchi Yan, Xiu Li, Xuelong Li

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

Learning a world model for model-free Reinforcement Learning (RL) agents can significantly improve the sample efficiency by learning policies in imagination. However, building a world model for Multi-Agent RL (MARL) can be particularly challenging due to the scalability issue across different number of agents in a centralized architecture, and also the non-stationarity issue in a decentralized architecture stemming from the inter-dependency among agents. To address both challenges, we propose a novel world model for MARL that learns decentralized local dynamics for scalability, combined with a centralized representation aggregation from all agents. We cast the dynamics learning as an auto-regressive sequence modeling problem over discrete tokens by leveraging the expressive Transformer architecture, in order to model complex local dynamics across different agents and provide accurate and consistent long-term imaginations. As the first pioneering Transformer-based world model for multi-agent systems, we introduce a Perceiver Transformer as an effective solution to enable centralized representation aggregation within this context. Extensive results on Starcraft Multi-Agent Challenge (SMAC) and MAMujoco demonstrate superior sample efficiency and overall performance compared to strong model-free approaches and existing model-based methods.

源语言	英语
期刊	Transactions on Machine Learning Research
卷	2025-May
出版状态	已出版 - 5月 2025
已对外发布	是

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{3a499e87826f496ca00e3a755ddb5f09,

title = "Decentralized Transformers with Centralized Aggregation are Sample-Efficient Multi-Agent World Models",

abstract = "Learning a world model for model-free Reinforcement Learning (RL) agents can significantly improve the sample efficiency by learning policies in imagination. However, building a world model for Multi-Agent RL (MARL) can be particularly challenging due to the scalability issue across different number of agents in a centralized architecture, and also the non-stationarity issue in a decentralized architecture stemming from the inter-dependency among agents. To address both challenges, we propose a novel world model for MARL that learns decentralized local dynamics for scalability, combined with a centralized representation aggregation from all agents. We cast the dynamics learning as an auto-regressive sequence modeling problem over discrete tokens by leveraging the expressive Transformer architecture, in order to model complex local dynamics across different agents and provide accurate and consistent long-term imaginations. As the first pioneering Transformer-based world model for multi-agent systems, we introduce a Perceiver Transformer as an effective solution to enable centralized representation aggregation within this context. Extensive results on Starcraft Multi-Agent Challenge (SMAC) and MAMujoco demonstrate superior sample efficiency and overall performance compared to strong model-free approaches and existing model-based methods.",

author = "Yang Zhang and Chenjia Bai and Bin Zhao and Junchi Yan and Xiu Li and Xuelong Li",

year = "2025",

month = may,

language = "英语",

volume = "2025-May",

journal = "Transactions on Machine Learning Research",

issn = "2835-8856",

publisher = "Transactions on Machine Learning Research",

}

TY - JOUR

T1 - Decentralized Transformers with Centralized Aggregation are Sample-Efficient Multi-Agent World Models

AU - Zhang, Yang

AU - Bai, Chenjia

AU - Zhao, Bin

AU - Yan, Junchi

AU - Li, Xiu

AU - Li, Xuelong

PY - 2025/5

Y1 - 2025/5

N2 - Learning a world model for model-free Reinforcement Learning (RL) agents can significantly improve the sample efficiency by learning policies in imagination. However, building a world model for Multi-Agent RL (MARL) can be particularly challenging due to the scalability issue across different number of agents in a centralized architecture, and also the non-stationarity issue in a decentralized architecture stemming from the inter-dependency among agents. To address both challenges, we propose a novel world model for MARL that learns decentralized local dynamics for scalability, combined with a centralized representation aggregation from all agents. We cast the dynamics learning as an auto-regressive sequence modeling problem over discrete tokens by leveraging the expressive Transformer architecture, in order to model complex local dynamics across different agents and provide accurate and consistent long-term imaginations. As the first pioneering Transformer-based world model for multi-agent systems, we introduce a Perceiver Transformer as an effective solution to enable centralized representation aggregation within this context. Extensive results on Starcraft Multi-Agent Challenge (SMAC) and MAMujoco demonstrate superior sample efficiency and overall performance compared to strong model-free approaches and existing model-based methods.

AB - Learning a world model for model-free Reinforcement Learning (RL) agents can significantly improve the sample efficiency by learning policies in imagination. However, building a world model for Multi-Agent RL (MARL) can be particularly challenging due to the scalability issue across different number of agents in a centralized architecture, and also the non-stationarity issue in a decentralized architecture stemming from the inter-dependency among agents. To address both challenges, we propose a novel world model for MARL that learns decentralized local dynamics for scalability, combined with a centralized representation aggregation from all agents. We cast the dynamics learning as an auto-regressive sequence modeling problem over discrete tokens by leveraging the expressive Transformer architecture, in order to model complex local dynamics across different agents and provide accurate and consistent long-term imaginations. As the first pioneering Transformer-based world model for multi-agent systems, we introduce a Perceiver Transformer as an effective solution to enable centralized representation aggregation within this context. Extensive results on Starcraft Multi-Agent Challenge (SMAC) and MAMujoco demonstrate superior sample efficiency and overall performance compared to strong model-free approaches and existing model-based methods.

UR - http://www.scopus.com/inward/record.url?scp=105006757196&partnerID=8YFLogxK

M3 - 文章

AN - SCOPUS:105006757196

SN - 2835-8856

VL - 2025-May

JO - Transactions on Machine Learning Research

JF - Transactions on Machine Learning Research

ER -

Decentralized Transformers with Centralized Aggregation are Sample-Efficient Multi-Agent World Models

摘要

其它文件与链接

指纹

引用此