Multi-Agent Deep Deterministic Policy Gradient Algorithm Based on Classification Experience Replay

Xiaoying Sun, Jinchao Chen, Chenglie Du, Mengying Zhan

科研成果: 书/报告/会议事项章节会议稿件同行评审

2 引用 (Scopus)

摘要

In recent years, multi-agent reinforcement learning has been applied in many fields, such as urban traffic control, autonomous UAV operations, etc. Although the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm has been used in various simulation environments as a classic reinforcement algorithm, its training efficiency is low and the convergence speed is slow due to its original experience playback mechanism and network structure. The random experience replay mechanism adopted by the algorithm breaks the time series correlation between data samples. However, the experience replay mechanism does not take advantage of important samples. Therefore, the paper proposes a Multi-Agent Deep Deterministic Policy Gradient method based on classification experience replay, which modifies the traditional random experience replay into classification experience replay. Classified storage can make full use of important samples. At the same time, the Critic network and the Actor network are updated asynchronously, and the learned better Critic network is used to guide the Actor network update. Finally, to verify the effectiveness of the proposed algorithm, the improved algorithm is compared with the traditional MADDPG method in a simulation environment.

源语言英语
主期刊名IEEE 6th Advanced Information Technology, Electronic and Automation Control Conference, IAEAC 2022
编辑Bing Xu
出版商Institute of Electrical and Electronics Engineers Inc.
988-992
页数5
ISBN(电子版)9781665458641
DOI
出版状态已出版 - 2022
活动6th IEEE Advanced Information Technology, Electronic and Automation Control Conference, IAEAC 2022 - Beijing, 中国
期限: 3 10月 20225 10月 2022

出版系列

姓名IEEE Advanced Information Technology, Electronic and Automation Control Conference (IAEAC)
2022-October
ISSN(印刷版)2689-6621

会议

会议6th IEEE Advanced Information Technology, Electronic and Automation Control Conference, IAEAC 2022
国家/地区中国
Beijing
时期3/10/225/10/22

指纹

探究 'Multi-Agent Deep Deterministic Policy Gradient Algorithm Based on Classification Experience Replay' 的科研主题。它们共同构成独一无二的指纹。

引用此