TY - JOUR
T1 - Mutual Information-Guided Subtask Selection for Zero-Shot Generalization in Multi-Agent Reinforcement Learning
AU - Wu, Shijie
AU - Yao, Yuan
AU - Zhang, Wenqi
AU - Zhu, Yining
AU - Hu, Yujiao
AU - Yang, Gang
AU - Zhou, Xingshe
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Modular methods, which decompose complex joint policies into function-specific sub-policies, have been widely adopted to enhance asymptotic performance in single-task cooperative multi-agent reinforcement learning (MARL). However, modular policies trained on source tasks often struggle to generalize to unseen scenarios due to variations across tasks, such as mismatched action spaces and divergent state dynamics. To address this challenge, we propose Mutual Information-Guided Subtask Selection(MIGSS), a novel framework that enhances zero-shot generalization in MARL through two key innovations: a Discriminative Group Trajectory Encoder and Global Attention-Driven Coordination. Specifically, the Discriminative Group Trajectory Encoder remaps agent trajectories by maximizing mutual information between agent trajectories and dynamically assigned groups. This optimizes cross-task consistent group trajectory with broader embedding distributions. This encourages agents in distinct states to select specialized subtasks, effectively promoting functional modularity. Meanwhile, the Global Attention-Driven Coordination employs a global attention mechanism to integrate state information, coordinating group trajectories for expressive credit assignment. Extensive experiments in StarCraft II cooperative scenarios demonstrate that MIGSS significantly outperforms superior zero-shot generalization baselines in both single-task and multi-task settings.Visualization analyses confirm that the learned group trajectories successfully disperse agent trajectories into a consistent and broader embedding space, thereby enhancing subtask modularization.
AB - Modular methods, which decompose complex joint policies into function-specific sub-policies, have been widely adopted to enhance asymptotic performance in single-task cooperative multi-agent reinforcement learning (MARL). However, modular policies trained on source tasks often struggle to generalize to unseen scenarios due to variations across tasks, such as mismatched action spaces and divergent state dynamics. To address this challenge, we propose Mutual Information-Guided Subtask Selection(MIGSS), a novel framework that enhances zero-shot generalization in MARL through two key innovations: a Discriminative Group Trajectory Encoder and Global Attention-Driven Coordination. Specifically, the Discriminative Group Trajectory Encoder remaps agent trajectories by maximizing mutual information between agent trajectories and dynamically assigned groups. This optimizes cross-task consistent group trajectory with broader embedding distributions. This encourages agents in distinct states to select specialized subtasks, effectively promoting functional modularity. Meanwhile, the Global Attention-Driven Coordination employs a global attention mechanism to integrate state information, coordinating group trajectories for expressive credit assignment. Extensive experiments in StarCraft II cooperative scenarios demonstrate that MIGSS significantly outperforms superior zero-shot generalization baselines in both single-task and multi-task settings.Visualization analyses confirm that the learned group trajectories successfully disperse agent trajectories into a consistent and broader embedding space, thereby enhancing subtask modularization.
KW - Contrastive Learning
KW - Multi-Agent Reinforcement Learning
KW - Multi-Task Learning
KW - Subtask Decompose
KW - Zero-Shot Generalization
UR - https://www.scopus.com/pages/publications/105029233300
U2 - 10.1109/IJCNN64981.2025.11229170
DO - 10.1109/IJCNN64981.2025.11229170
M3 - 会议文章
AN - SCOPUS:105029233300
SN - 2161-4393
JO - Proceedings of the International Joint Conference on Neural Networks
JF - Proceedings of the International Joint Conference on Neural Networks
T2 - 2025 International Joint Conference on Neural Networks, IJCNN 2025
Y2 - 30 June 2025 through 5 July 2025
ER -