TY - GEN
T1 - Multi-Agent Deep Deterministic Policy Gradient Algorithm Based on Classification Experience Replay
AU - Sun, Xiaoying
AU - Chen, Jinchao
AU - Du, Chenglie
AU - Zhan, Mengying
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - In recent years, multi-agent reinforcement learning has been applied in many fields, such as urban traffic control, autonomous UAV operations, etc. Although the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm has been used in various simulation environments as a classic reinforcement algorithm, its training efficiency is low and the convergence speed is slow due to its original experience playback mechanism and network structure. The random experience replay mechanism adopted by the algorithm breaks the time series correlation between data samples. However, the experience replay mechanism does not take advantage of important samples. Therefore, the paper proposes a Multi-Agent Deep Deterministic Policy Gradient method based on classification experience replay, which modifies the traditional random experience replay into classification experience replay. Classified storage can make full use of important samples. At the same time, the Critic network and the Actor network are updated asynchronously, and the learned better Critic network is used to guide the Actor network update. Finally, to verify the effectiveness of the proposed algorithm, the improved algorithm is compared with the traditional MADDPG method in a simulation environment.
AB - In recent years, multi-agent reinforcement learning has been applied in many fields, such as urban traffic control, autonomous UAV operations, etc. Although the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm has been used in various simulation environments as a classic reinforcement algorithm, its training efficiency is low and the convergence speed is slow due to its original experience playback mechanism and network structure. The random experience replay mechanism adopted by the algorithm breaks the time series correlation between data samples. However, the experience replay mechanism does not take advantage of important samples. Therefore, the paper proposes a Multi-Agent Deep Deterministic Policy Gradient method based on classification experience replay, which modifies the traditional random experience replay into classification experience replay. Classified storage can make full use of important samples. At the same time, the Critic network and the Actor network are updated asynchronously, and the learned better Critic network is used to guide the Actor network update. Finally, to verify the effectiveness of the proposed algorithm, the improved algorithm is compared with the traditional MADDPG method in a simulation environment.
KW - classification experience replay
KW - deep reinforcement learning
KW - multi-agent systems
KW - overfitting
KW - reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=85142223478&partnerID=8YFLogxK
U2 - 10.1109/IAEAC54830.2022.9929494
DO - 10.1109/IAEAC54830.2022.9929494
M3 - 会议稿件
AN - SCOPUS:85142223478
T3 - IEEE Advanced Information Technology, Electronic and Automation Control Conference (IAEAC)
SP - 988
EP - 992
BT - IEEE 6th Advanced Information Technology, Electronic and Automation Control Conference, IAEAC 2022
A2 - Xu, Bing
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 6th IEEE Advanced Information Technology, Electronic and Automation Control Conference, IAEAC 2022
Y2 - 3 October 2022 through 5 October 2022
ER -