Mutual Information-Guided Subtask Selection for Zero-Shot Generalization in Multi-Agent Reinforcement Learning

  • Shijie Wu
  • , Yuan Yao
  • , Wenqi Zhang
  • , Yining Zhu
  • , Yujiao Hu
  • , Gang Yang
  • , Xingshe Zhou

Research output: Contribution to journalConference articlepeer-review

Abstract

Modular methods, which decompose complex joint policies into function-specific sub-policies, have been widely adopted to enhance asymptotic performance in single-task cooperative multi-agent reinforcement learning (MARL). However, modular policies trained on source tasks often struggle to generalize to unseen scenarios due to variations across tasks, such as mismatched action spaces and divergent state dynamics. To address this challenge, we propose Mutual Information-Guided Subtask Selection(MIGSS), a novel framework that enhances zero-shot generalization in MARL through two key innovations: a Discriminative Group Trajectory Encoder and Global Attention-Driven Coordination. Specifically, the Discriminative Group Trajectory Encoder remaps agent trajectories by maximizing mutual information between agent trajectories and dynamically assigned groups. This optimizes cross-task consistent group trajectory with broader embedding distributions. This encourages agents in distinct states to select specialized subtasks, effectively promoting functional modularity. Meanwhile, the Global Attention-Driven Coordination employs a global attention mechanism to integrate state information, coordinating group trajectories for expressive credit assignment. Extensive experiments in StarCraft II cooperative scenarios demonstrate that MIGSS significantly outperforms superior zero-shot generalization baselines in both single-task and multi-task settings.Visualization analyses confirm that the learned group trajectories successfully disperse agent trajectories into a consistent and broader embedding space, thereby enhancing subtask modularization.

Keywords

  • Contrastive Learning
  • Multi-Agent Reinforcement Learning
  • Multi-Task Learning
  • Subtask Decompose
  • Zero-Shot Generalization

Fingerprint

Dive into the research topics of 'Mutual Information-Guided Subtask Selection for Zero-Shot Generalization in Multi-Agent Reinforcement Learning'. Together they form a unique fingerprint.

Cite this