Cournot Policy Model: Rethinking centralized training in multi-agent reinforcement learning

Jingchen Li; Yusen Yang; Ziming He; Huarui Wu; Haobin Shi; Wenbai Chen

doi:10.1016/j.ins.2024.120983

Cournot Policy Model: Rethinking centralized training in multi-agent reinforcement learning

Jingchen Li, Yusen Yang, Ziming He, Huarui Wu, Haobin Shi, Wenbai Chen

School of Computer Science

Research output: Contribution to journal › Article › peer-review

1 Scopus citations

Abstract

This work studies Centralized Training and Decentralized Execution (CTDE), which is a powerful mechanism to ease multi-agent reinforcement learning. Although the centralized evaluation ensures unbiased estimates of Q-value, peers with unknown policies make the decentralized policy far from the expectation. To make progress in more stabilized and effective joint policy, we develop a novel game framework, termed Cournot Policy Model, to enhance the CTDE-based multi-agent learning. Combining the game theory and reinforcement learning, we regard the joint decision-making in a single time step as a Cournot duopoly model, and then design a Hetero Variational Auto-Encoder to model the policies of peers in the decentralized execution. With a conditional policy, each agent is guided to a stable mixed-strategy equilibrium even though the joint policy evolves over time. We further demonstrate that such an equilibrium must exist in the case of centralized evaluation. We investigate the improvement of our method on existing centralized learning methods. The experimental results on a comprehensive collection of benchmarks indicate our approach consistently outperforms baseline methods.

Original language	English
Article number	120983
Journal	Information Sciences
Volume	677
DOIs	https://doi.org/10.1016/j.ins.2024.120983
State	Published - Aug 2024

Keywords

Machine learning
Multi-agent reinforcement learning
Multi-agent system

Access to Document

10.1016/j.ins.2024.120983

Cite this

@article{f111d9cff6934a08a0fe1591d22cdba1,

title = "Cournot Policy Model: Rethinking centralized training in multi-agent reinforcement learning",

abstract = "This work studies Centralized Training and Decentralized Execution (CTDE), which is a powerful mechanism to ease multi-agent reinforcement learning. Although the centralized evaluation ensures unbiased estimates of Q-value, peers with unknown policies make the decentralized policy far from the expectation. To make progress in more stabilized and effective joint policy, we develop a novel game framework, termed Cournot Policy Model, to enhance the CTDE-based multi-agent learning. Combining the game theory and reinforcement learning, we regard the joint decision-making in a single time step as a Cournot duopoly model, and then design a Hetero Variational Auto-Encoder to model the policies of peers in the decentralized execution. With a conditional policy, each agent is guided to a stable mixed-strategy equilibrium even though the joint policy evolves over time. We further demonstrate that such an equilibrium must exist in the case of centralized evaluation. We investigate the improvement of our method on existing centralized learning methods. The experimental results on a comprehensive collection of benchmarks indicate our approach consistently outperforms baseline methods.",

keywords = "Machine learning, Multi-agent reinforcement learning, Multi-agent system",

author = "Jingchen Li and Yusen Yang and Ziming He and Huarui Wu and Haobin Shi and Wenbai Chen",

note = "Publisher Copyright: {\textcopyright} 2024 Elsevier Inc.",

year = "2024",

month = aug,

doi = "10.1016/j.ins.2024.120983",

language = "英语",

volume = "677",

journal = "Information Sciences",

issn = "0020-0255",

publisher = "Elsevier Inc.",

}

TY - JOUR

T1 - Cournot Policy Model

T2 - Rethinking centralized training in multi-agent reinforcement learning

AU - Li, Jingchen

AU - Yang, Yusen

AU - He, Ziming

AU - Wu, Huarui

AU - Shi, Haobin

AU - Chen, Wenbai

PY - 2024/8

Y1 - 2024/8

N2 - This work studies Centralized Training and Decentralized Execution (CTDE), which is a powerful mechanism to ease multi-agent reinforcement learning. Although the centralized evaluation ensures unbiased estimates of Q-value, peers with unknown policies make the decentralized policy far from the expectation. To make progress in more stabilized and effective joint policy, we develop a novel game framework, termed Cournot Policy Model, to enhance the CTDE-based multi-agent learning. Combining the game theory and reinforcement learning, we regard the joint decision-making in a single time step as a Cournot duopoly model, and then design a Hetero Variational Auto-Encoder to model the policies of peers in the decentralized execution. With a conditional policy, each agent is guided to a stable mixed-strategy equilibrium even though the joint policy evolves over time. We further demonstrate that such an equilibrium must exist in the case of centralized evaluation. We investigate the improvement of our method on existing centralized learning methods. The experimental results on a comprehensive collection of benchmarks indicate our approach consistently outperforms baseline methods.

AB - This work studies Centralized Training and Decentralized Execution (CTDE), which is a powerful mechanism to ease multi-agent reinforcement learning. Although the centralized evaluation ensures unbiased estimates of Q-value, peers with unknown policies make the decentralized policy far from the expectation. To make progress in more stabilized and effective joint policy, we develop a novel game framework, termed Cournot Policy Model, to enhance the CTDE-based multi-agent learning. Combining the game theory and reinforcement learning, we regard the joint decision-making in a single time step as a Cournot duopoly model, and then design a Hetero Variational Auto-Encoder to model the policies of peers in the decentralized execution. With a conditional policy, each agent is guided to a stable mixed-strategy equilibrium even though the joint policy evolves over time. We further demonstrate that such an equilibrium must exist in the case of centralized evaluation. We investigate the improvement of our method on existing centralized learning methods. The experimental results on a comprehensive collection of benchmarks indicate our approach consistently outperforms baseline methods.

KW - Machine learning

KW - Multi-agent reinforcement learning

KW - Multi-agent system

UR - http://www.scopus.com/inward/record.url?scp=85195701321&partnerID=8YFLogxK

U2 - 10.1016/j.ins.2024.120983

DO - 10.1016/j.ins.2024.120983

M3 - 文章

AN - SCOPUS:85195701321

SN - 0020-0255

VL - 677

JO - Information Sciences

JF - Information Sciences

M1 - 120983

ER -

Cournot Policy Model: Rethinking centralized training in multi-agent reinforcement learning

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this