Twin Delayed Multi-Agent Deep Deterministic Policy Gradient

Mengying Zhan; Jinchao Chen; Chenglie Du; Yuxin Duan

doi:10.1109/PIC53636.2021.9687069

Twin Delayed Multi-Agent Deep Deterministic Policy Gradient

Mengying Zhan, Jinchao Chen, Chenglie Du, Yuxin Duan

计算机学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

7 引用（Scopus）

摘要

Recently, reinforcement learning has made remarkable achievements in the fields of natural science, engineering, medicine and operational research. Reinforcement learning addresses sequence problems and considers long-term returns. This long-term view of reinforcement learning is critical to find the optimal solution of many problems. The existing multi- agent reinforcement learning algorithms have the problem of overestimation in estimating the Q value. Unfortunately, there have not been many studies on overestimation of agent reinforcement learning, which will affect the learning efficiency of reinforcement learning. Based on the traditional multi-agent reinforcement learning algorithm, this paper improves the actor network and critic network, optimizes the overestimation of Q value and adopts the update delayed method to make the actor training more stable. In order to test the effectiveness of the algorithm structure, the modified method is compared with the traditional MADDPG, DDPG and DQN methods in the simulation environment.

源语言	英语
主期刊名	Proceedings of the 2021 IEEE International Conference on Progress in Informatics and Computing, PIC 2021
编辑	Yinglin Wang, Zheying Zhang
出版商	Institute of Electrical and Electronics Engineers Inc.
页	48-52
页数	5
ISBN（电子版）	9781665426558
DOI	https://doi.org/10.1109/PIC53636.2021.9687069
出版状态	已出版 - 2021
活动	8th IEEE International Conference on Progress in Informatics and Computing, PIC 2021 - Virtual, Online, 中国期限: 17 12月 2021 → 19 12月 2021

出版系列

姓名	Proceedings of the 2021 IEEE International Conference on Progress in Informatics and Computing, PIC 2021

会议

会议	8th IEEE International Conference on Progress in Informatics and Computing, PIC 2021
国家/地区	中国
市	Virtual, Online
时期	17/12/21 → 19/12/21

访问文件

10.1109/PIC53636.2021.9687069

其它文件与链接

链接到 Scopus 的出版物

引用此

Zhan, M., Chen, J., Du, C., & Duan, Y. (2021). Twin Delayed Multi-Agent Deep Deterministic Policy Gradient. 在 Y. Wang, & Z. Zhang (编辑), Proceedings of the 2021 IEEE International Conference on Progress in Informatics and Computing, PIC 2021 (页码 48-52). (Proceedings of the 2021 IEEE International Conference on Progress in Informatics and Computing, PIC 2021). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/PIC53636.2021.9687069

Zhan, Mengying ; Chen, Jinchao ; Du, Chenglie 等. / Twin Delayed Multi-Agent Deep Deterministic Policy Gradient. Proceedings of the 2021 IEEE International Conference on Progress in Informatics and Computing, PIC 2021. 编辑 / Yinglin Wang ; Zheying Zhang. Institute of Electrical and Electronics Engineers Inc., 2021. 页码 48-52 (Proceedings of the 2021 IEEE International Conference on Progress in Informatics and Computing, PIC 2021).

@inproceedings{dfdd0d890400415b8ae73abe0dc80bd2,

title = "Twin Delayed Multi-Agent Deep Deterministic Policy Gradient",

abstract = "Recently, reinforcement learning has made remarkable achievements in the fields of natural science, engineering, medicine and operational research. Reinforcement learning addresses sequence problems and considers long-term returns. This long-term view of reinforcement learning is critical to find the optimal solution of many problems. The existing multi- agent reinforcement learning algorithms have the problem of overestimation in estimating the Q value. Unfortunately, there have not been many studies on overestimation of agent reinforcement learning, which will affect the learning efficiency of reinforcement learning. Based on the traditional multi-agent reinforcement learning algorithm, this paper improves the actor network and critic network, optimizes the overestimation of Q value and adopts the update delayed method to make the actor training more stable. In order to test the effectiveness of the algorithm structure, the modified method is compared with the traditional MADDPG, DDPG and DQN methods in the simulation environment.",

keywords = "Deep learning, multi-agent system, neural networks, overestimation, Reinforcement learning",

author = "Mengying Zhan and Jinchao Chen and Chenglie Du and Yuxin Duan",

note = "Publisher Copyright: {\textcopyright} 2021 IEEE.; 8th IEEE International Conference on Progress in Informatics and Computing, PIC 2021 ; Conference date: 17-12-2021 Through 19-12-2021",

year = "2021",

doi = "10.1109/PIC53636.2021.9687069",

language = "英语",

series = "Proceedings of the 2021 IEEE International Conference on Progress in Informatics and Computing, PIC 2021",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "48--52",

editor = "Yinglin Wang and Zheying Zhang",

booktitle = "Proceedings of the 2021 IEEE International Conference on Progress in Informatics and Computing, PIC 2021",

}

Zhan, M, Chen, J , Du, C & Duan, Y 2021, Twin Delayed Multi-Agent Deep Deterministic Policy Gradient. 在 Y Wang & Z Zhang (编辑), Proceedings of the 2021 IEEE International Conference on Progress in Informatics and Computing, PIC 2021. Proceedings of the 2021 IEEE International Conference on Progress in Informatics and Computing, PIC 2021, Institute of Electrical and Electronics Engineers Inc., 页码 48-52, 8th IEEE International Conference on Progress in Informatics and Computing, PIC 2021, Virtual, Online, 中国, 17/12/21. https://doi.org/10.1109/PIC53636.2021.9687069

Twin Delayed Multi-Agent Deep Deterministic Policy Gradient. / Zhan, Mengying; Chen, Jinchao ; Du, Chenglie 等.
Proceedings of the 2021 IEEE International Conference on Progress in Informatics and Computing, PIC 2021. 编辑 / Yinglin Wang; Zheying Zhang. Institute of Electrical and Electronics Engineers Inc., 2021. 页码 48-52 (Proceedings of the 2021 IEEE International Conference on Progress in Informatics and Computing, PIC 2021).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Twin Delayed Multi-Agent Deep Deterministic Policy Gradient

AU - Zhan, Mengying

AU - Chen, Jinchao

AU - Du, Chenglie

AU - Duan, Yuxin

PY - 2021

Y1 - 2021

N2 - Recently, reinforcement learning has made remarkable achievements in the fields of natural science, engineering, medicine and operational research. Reinforcement learning addresses sequence problems and considers long-term returns. This long-term view of reinforcement learning is critical to find the optimal solution of many problems. The existing multi- agent reinforcement learning algorithms have the problem of overestimation in estimating the Q value. Unfortunately, there have not been many studies on overestimation of agent reinforcement learning, which will affect the learning efficiency of reinforcement learning. Based on the traditional multi-agent reinforcement learning algorithm, this paper improves the actor network and critic network, optimizes the overestimation of Q value and adopts the update delayed method to make the actor training more stable. In order to test the effectiveness of the algorithm structure, the modified method is compared with the traditional MADDPG, DDPG and DQN methods in the simulation environment.

AB - Recently, reinforcement learning has made remarkable achievements in the fields of natural science, engineering, medicine and operational research. Reinforcement learning addresses sequence problems and considers long-term returns. This long-term view of reinforcement learning is critical to find the optimal solution of many problems. The existing multi- agent reinforcement learning algorithms have the problem of overestimation in estimating the Q value. Unfortunately, there have not been many studies on overestimation of agent reinforcement learning, which will affect the learning efficiency of reinforcement learning. Based on the traditional multi-agent reinforcement learning algorithm, this paper improves the actor network and critic network, optimizes the overestimation of Q value and adopts the update delayed method to make the actor training more stable. In order to test the effectiveness of the algorithm structure, the modified method is compared with the traditional MADDPG, DDPG and DQN methods in the simulation environment.

KW - Deep learning

KW - multi-agent system

KW - neural networks

KW - overestimation

KW - Reinforcement learning

UR - http://www.scopus.com/inward/record.url?scp=85125814829&partnerID=8YFLogxK

U2 - 10.1109/PIC53636.2021.9687069

DO - 10.1109/PIC53636.2021.9687069

M3 - 会议稿件

AN - SCOPUS:85125814829

T3 - Proceedings of the 2021 IEEE International Conference on Progress in Informatics and Computing, PIC 2021

SP - 48

EP - 52

BT - Proceedings of the 2021 IEEE International Conference on Progress in Informatics and Computing, PIC 2021

A2 - Wang, Yinglin

A2 - Zhang, Zheying

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 8th IEEE International Conference on Progress in Informatics and Computing, PIC 2021

Y2 - 17 December 2021 through 19 December 2021

ER -

Zhan M, Chen J , Du C, Duan Y. Twin Delayed Multi-Agent Deep Deterministic Policy Gradient. 在 Wang Y, Zhang Z, 编辑, Proceedings of the 2021 IEEE International Conference on Progress in Informatics and Computing, PIC 2021. Institute of Electrical and Electronics Engineers Inc. 2021. 页码 48-52. (Proceedings of the 2021 IEEE International Conference on Progress in Informatics and Computing, PIC 2021). doi: 10.1109/PIC53636.2021.9687069

Twin Delayed Multi-Agent Deep Deterministic Policy Gradient

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此