Abstract
How to achieve emergent behaviors under incomplete information is a major challenge in multi-agent game learning. In this paper, we propose a generalized partial cooperation framework to realize Nash equilibrium (NE) selection and transition in distributed online Markov games, requiring only local interaction and information sharing. By introducing the graph self-attention mechanism into policy hierarchy, the behavior logic of the agent is decomposed to collaboratively learn the global optimal NE point from two distinct time scales. This leads to the corresponding bilevel optimization problem: the upper-level fine-tunes the reward structure to eliminate suboptimal equilibrium, and the lower-level learns optimal policy within the reformulated game. For the non-convex problem with non-unique NE, we develop a novel algorithm by concurrently integrating distributed online optimization and learning theory in the non-stationary environment. Specifically, lower-level utilizes Q-learning to acquire optimal policy without any prior knowledge, while upper-level inherits the environmental information explored by lower-level and uses a distributed Alternating Direction Method of Multipliers (ADMM) to adjust reward-sharing weight. In addition, we give a convergence proof of the alternating learning and optimization iteration. Finally, simulations on the multi-agent prisoner's dilemma and Unmanned Aerial Vehicle (UAV) coverage control task are presented to demonstrate the effectiveness of proposed algorithm.
| Original language | English |
|---|---|
| Journal | IEEE Transactions on Signal and Information Processing over Networks |
| DOIs | |
| State | Accepted/In press - 2025 |
Keywords
- Partial cooperation
- bilevel optimization
- graph self-attention mechanism
- policy hierarchy
Fingerprint
Dive into the research topics of 'Hierarchical Learning in Distributed Online Markov Games via Partial Cooperation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver