跳到主要导航 跳到搜索 跳到主要内容

Unsupervised learning-enhanced neighborhood relationship utilization for intrinsic reward-driven multi-agent exploration

  • Northwestern Polytechnical University Xian

科研成果: 期刊稿件文章同行评审

摘要

Multi-agent reinforcement learning (MARL) in sparse-reward environments often suffers from unreliable, uncoordinated exploration among neighboring agents. We propose Temporal Contrastive Distillation (TCD), a novel plug-and-play progressive mutual calibration architecture that establishes dynamic coordination signals for decentralized agents. Unlike conventional intrinsic reward distillation, TCD uses two modules for mutual calibration: (1) the Adaptive Attention Operator (AAO), an intrinsic reward distillation module with attention, detects emerging neighborhood-level coordination patterns; (2) the Attention Operator Evolver (AOE), driven by contrastive learning, achieves dual coordination via Contrastive Parameter Adaptation (CPA), which generates operator-updating signals, and Momentum-guided Progressive Transfer (MPT), which transfers these signals to guide AAO evolution. Through their interactions, TCD enables agents to recognize and leverage neighborhood relationships in sparse-reward settings, mitigating the challenges posed by sparse rewards. Extensive experiments on StarCraft II (SMAC) and Google Research Football (GRF) show that TCD improves performance and sample efficiency over strong baselines, helping agents discover and refine complex coordination tactics, from micromanagement in SMAC to dynamic passing in GRF, highlighting TCD's broad applicability.

源语言英语
文章编号115540
期刊Knowledge-Based Systems
339
DOI
出版状态已出版 - 22 4月 2026

指纹

探究 'Unsupervised learning-enhanced neighborhood relationship utilization for intrinsic reward-driven multi-agent exploration' 的科研主题。它们共同构成独一无二的指纹。

引用此