Multi-Agent Deep Deterministic Policy Gradient Algorithm Based on Classification Experience Replay

Xiaoying Sun; Jinchao Chen; Chenglie Du; Mengying Zhan

doi:10.1109/IAEAC54830.2022.9929494

Multi-Agent Deep Deterministic Policy Gradient Algorithm Based on Classification Experience Replay

Xiaoying Sun, Jinchao Chen, Chenglie Du, Mengying Zhan

计算机学院

Northwestern Polytechnical University Xian

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

2 引用（Scopus）

摘要

In recent years, multi-agent reinforcement learning has been applied in many fields, such as urban traffic control, autonomous UAV operations, etc. Although the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm has been used in various simulation environments as a classic reinforcement algorithm, its training efficiency is low and the convergence speed is slow due to its original experience playback mechanism and network structure. The random experience replay mechanism adopted by the algorithm breaks the time series correlation between data samples. However, the experience replay mechanism does not take advantage of important samples. Therefore, the paper proposes a Multi-Agent Deep Deterministic Policy Gradient method based on classification experience replay, which modifies the traditional random experience replay into classification experience replay. Classified storage can make full use of important samples. At the same time, the Critic network and the Actor network are updated asynchronously, and the learned better Critic network is used to guide the Actor network update. Finally, to verify the effectiveness of the proposed algorithm, the improved algorithm is compared with the traditional MADDPG method in a simulation environment.

源语言	英语
主期刊名	IEEE 6th Advanced Information Technology, Electronic and Automation Control Conference, IAEAC 2022
编辑	Bing Xu
出版商	Institute of Electrical and Electronics Engineers Inc.
页	988-992
页数	5
ISBN（电子版）	9781665458641
DOI	https://doi.org/10.1109/IAEAC54830.2022.9929494
出版状态	已出版 - 2022
活动	6th IEEE Advanced Information Technology, Electronic and Automation Control Conference, IAEAC 2022 - Beijing, 中国期限: 3 10月 2022 → 5 10月 2022

出版系列

姓名	IEEE Advanced Information Technology, Electronic and Automation Control Conference (IAEAC)
卷	2022-October
ISSN（印刷版）	2689-6621

会议

会议	6th IEEE Advanced Information Technology, Electronic and Automation Control Conference, IAEAC 2022
国家/地区	中国
市	Beijing
时期	3/10/22 → 5/10/22

访问文件

10.1109/IAEAC54830.2022.9929494

其它文件与链接

链接到 Scopus 的出版物

引用此

Sun, X., Chen, J., Du, C., & Zhan, M. (2022). Multi-Agent Deep Deterministic Policy Gradient Algorithm Based on Classification Experience Replay. 在 B. Xu (编辑), IEEE 6th Advanced Information Technology, Electronic and Automation Control Conference, IAEAC 2022 (页码 988-992). (IEEE Advanced Information Technology, Electronic and Automation Control Conference (IAEAC); 卷 2022-October). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IAEAC54830.2022.9929494

Sun, Xiaoying ; Chen, Jinchao ; Du, Chenglie 等. / Multi-Agent Deep Deterministic Policy Gradient Algorithm Based on Classification Experience Replay. IEEE 6th Advanced Information Technology, Electronic and Automation Control Conference, IAEAC 2022. 编辑 / Bing Xu. Institute of Electrical and Electronics Engineers Inc., 2022. 页码 988-992 (IEEE Advanced Information Technology, Electronic and Automation Control Conference (IAEAC)).

@inproceedings{2f47a2072fc3442299a57602490d94ed,

title = "Multi-Agent Deep Deterministic Policy Gradient Algorithm Based on Classification Experience Replay",

abstract = "In recent years, multi-agent reinforcement learning has been applied in many fields, such as urban traffic control, autonomous UAV operations, etc. Although the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm has been used in various simulation environments as a classic reinforcement algorithm, its training efficiency is low and the convergence speed is slow due to its original experience playback mechanism and network structure. The random experience replay mechanism adopted by the algorithm breaks the time series correlation between data samples. However, the experience replay mechanism does not take advantage of important samples. Therefore, the paper proposes a Multi-Agent Deep Deterministic Policy Gradient method based on classification experience replay, which modifies the traditional random experience replay into classification experience replay. Classified storage can make full use of important samples. At the same time, the Critic network and the Actor network are updated asynchronously, and the learned better Critic network is used to guide the Actor network update. Finally, to verify the effectiveness of the proposed algorithm, the improved algorithm is compared with the traditional MADDPG method in a simulation environment.",

keywords = "classification experience replay, deep reinforcement learning, multi-agent systems, overfitting, reinforcement learning",

author = "Xiaoying Sun and Jinchao Chen and Chenglie Du and Mengying Zhan",

note = "Publisher Copyright: {\textcopyright} 2022 IEEE.; 6th IEEE Advanced Information Technology, Electronic and Automation Control Conference, IAEAC 2022 ; Conference date: 03-10-2022 Through 05-10-2022",

year = "2022",

doi = "10.1109/IAEAC54830.2022.9929494",

language = "英语",

series = "IEEE Advanced Information Technology, Electronic and Automation Control Conference (IAEAC)",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "988--992",

editor = "Bing Xu",

booktitle = "IEEE 6th Advanced Information Technology, Electronic and Automation Control Conference, IAEAC 2022",

}

Sun, X, Chen, J , Du, C & Zhan, M 2022, Multi-Agent Deep Deterministic Policy Gradient Algorithm Based on Classification Experience Replay. 在 B Xu (编辑), IEEE 6th Advanced Information Technology, Electronic and Automation Control Conference, IAEAC 2022. IEEE Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), 卷 2022-October, Institute of Electrical and Electronics Engineers Inc., 页码 988-992, 6th IEEE Advanced Information Technology, Electronic and Automation Control Conference, IAEAC 2022, Beijing, 中国, 3/10/22. https://doi.org/10.1109/IAEAC54830.2022.9929494

Multi-Agent Deep Deterministic Policy Gradient Algorithm Based on Classification Experience Replay. / Sun, Xiaoying; Chen, Jinchao ; Du, Chenglie 等.
IEEE 6th Advanced Information Technology, Electronic and Automation Control Conference, IAEAC 2022. 编辑 / Bing Xu. Institute of Electrical and Electronics Engineers Inc., 2022. 页码 988-992 (IEEE Advanced Information Technology, Electronic and Automation Control Conference (IAEAC); 卷 2022-October).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Multi-Agent Deep Deterministic Policy Gradient Algorithm Based on Classification Experience Replay

AU - Sun, Xiaoying

AU - Chen, Jinchao

AU - Du, Chenglie

AU - Zhan, Mengying

PY - 2022

Y1 - 2022

N2 - In recent years, multi-agent reinforcement learning has been applied in many fields, such as urban traffic control, autonomous UAV operations, etc. Although the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm has been used in various simulation environments as a classic reinforcement algorithm, its training efficiency is low and the convergence speed is slow due to its original experience playback mechanism and network structure. The random experience replay mechanism adopted by the algorithm breaks the time series correlation between data samples. However, the experience replay mechanism does not take advantage of important samples. Therefore, the paper proposes a Multi-Agent Deep Deterministic Policy Gradient method based on classification experience replay, which modifies the traditional random experience replay into classification experience replay. Classified storage can make full use of important samples. At the same time, the Critic network and the Actor network are updated asynchronously, and the learned better Critic network is used to guide the Actor network update. Finally, to verify the effectiveness of the proposed algorithm, the improved algorithm is compared with the traditional MADDPG method in a simulation environment.

AB - In recent years, multi-agent reinforcement learning has been applied in many fields, such as urban traffic control, autonomous UAV operations, etc. Although the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm has been used in various simulation environments as a classic reinforcement algorithm, its training efficiency is low and the convergence speed is slow due to its original experience playback mechanism and network structure. The random experience replay mechanism adopted by the algorithm breaks the time series correlation between data samples. However, the experience replay mechanism does not take advantage of important samples. Therefore, the paper proposes a Multi-Agent Deep Deterministic Policy Gradient method based on classification experience replay, which modifies the traditional random experience replay into classification experience replay. Classified storage can make full use of important samples. At the same time, the Critic network and the Actor network are updated asynchronously, and the learned better Critic network is used to guide the Actor network update. Finally, to verify the effectiveness of the proposed algorithm, the improved algorithm is compared with the traditional MADDPG method in a simulation environment.

KW - classification experience replay

KW - deep reinforcement learning

KW - multi-agent systems

KW - overfitting

KW - reinforcement learning

UR - http://www.scopus.com/inward/record.url?scp=85142223478&partnerID=8YFLogxK

U2 - 10.1109/IAEAC54830.2022.9929494

DO - 10.1109/IAEAC54830.2022.9929494

M3 - 会议稿件

AN - SCOPUS:85142223478

T3 - IEEE Advanced Information Technology, Electronic and Automation Control Conference (IAEAC)

SP - 988

EP - 992

BT - IEEE 6th Advanced Information Technology, Electronic and Automation Control Conference, IAEAC 2022

A2 - Xu, Bing

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 6th IEEE Advanced Information Technology, Electronic and Automation Control Conference, IAEAC 2022

Y2 - 3 October 2022 through 5 October 2022

ER -

Sun X, Chen J , Du C, Zhan M. Multi-Agent Deep Deterministic Policy Gradient Algorithm Based on Classification Experience Replay. 在 Xu B, 编辑, IEEE 6th Advanced Information Technology, Electronic and Automation Control Conference, IAEAC 2022. Institute of Electrical and Electronics Engineers Inc. 2022. 页码 988-992. (IEEE Advanced Information Technology, Electronic and Automation Control Conference (IAEAC)). doi: 10.1109/IAEAC54830.2022.9929494

Multi-Agent Deep Deterministic Policy Gradient Algorithm Based on Classification Experience Replay

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此