Twin Delayed Multi-Agent Deep Deterministic Policy Gradient

Mengying Zhan; Jinchao Chen; Chenglie Du; Yuxin Duan

doi:10.1109/PIC53636.2021.9687069

Twin Delayed Multi-Agent Deep Deterministic Policy Gradient

Mengying Zhan, Jinchao Chen, Chenglie Du, Yuxin Duan

School of Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

7 Scopus citations

Abstract

Recently, reinforcement learning has made remarkable achievements in the fields of natural science, engineering, medicine and operational research. Reinforcement learning addresses sequence problems and considers long-term returns. This long-term view of reinforcement learning is critical to find the optimal solution of many problems. The existing multi- agent reinforcement learning algorithms have the problem of overestimation in estimating the Q value. Unfortunately, there have not been many studies on overestimation of agent reinforcement learning, which will affect the learning efficiency of reinforcement learning. Based on the traditional multi-agent reinforcement learning algorithm, this paper improves the actor network and critic network, optimizes the overestimation of Q value and adopts the update delayed method to make the actor training more stable. In order to test the effectiveness of the algorithm structure, the modified method is compared with the traditional MADDPG, DDPG and DQN methods in the simulation environment.

Original language	English
Title of host publication	Proceedings of the 2021 IEEE International Conference on Progress in Informatics and Computing, PIC 2021
Editors	Yinglin Wang, Zheying Zhang
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	48-52
Number of pages	5
ISBN (Electronic)	9781665426558
DOIs	https://doi.org/10.1109/PIC53636.2021.9687069
State	Published - 2021
Event	8th IEEE International Conference on Progress in Informatics and Computing, PIC 2021 - Virtual, Online, China Duration: 17 Dec 2021 → 19 Dec 2021

Publication series

Name	Proceedings of the 2021 IEEE International Conference on Progress in Informatics and Computing, PIC 2021

Conference

Conference	8th IEEE International Conference on Progress in Informatics and Computing, PIC 2021
Country/Territory	China
City	Virtual, Online
Period	17/12/21 → 19/12/21

Keywords

Deep learning
multi-agent system
neural networks
overestimation
Reinforcement learning

Access to Document

10.1109/PIC53636.2021.9687069

Cite this

Zhan, M., Chen, J., Du, C., & Duan, Y. (2021). Twin Delayed Multi-Agent Deep Deterministic Policy Gradient. In Y. Wang, & Z. Zhang (Eds.), Proceedings of the 2021 IEEE International Conference on Progress in Informatics and Computing, PIC 2021 (pp. 48-52). (Proceedings of the 2021 IEEE International Conference on Progress in Informatics and Computing, PIC 2021). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/PIC53636.2021.9687069

Zhan, Mengying ; Chen, Jinchao ; Du, Chenglie et al. / Twin Delayed Multi-Agent Deep Deterministic Policy Gradient. Proceedings of the 2021 IEEE International Conference on Progress in Informatics and Computing, PIC 2021. editor / Yinglin Wang ; Zheying Zhang. Institute of Electrical and Electronics Engineers Inc., 2021. pp. 48-52 (Proceedings of the 2021 IEEE International Conference on Progress in Informatics and Computing, PIC 2021).

@inproceedings{dfdd0d890400415b8ae73abe0dc80bd2,

title = "Twin Delayed Multi-Agent Deep Deterministic Policy Gradient",

abstract = "Recently, reinforcement learning has made remarkable achievements in the fields of natural science, engineering, medicine and operational research. Reinforcement learning addresses sequence problems and considers long-term returns. This long-term view of reinforcement learning is critical to find the optimal solution of many problems. The existing multi- agent reinforcement learning algorithms have the problem of overestimation in estimating the Q value. Unfortunately, there have not been many studies on overestimation of agent reinforcement learning, which will affect the learning efficiency of reinforcement learning. Based on the traditional multi-agent reinforcement learning algorithm, this paper improves the actor network and critic network, optimizes the overestimation of Q value and adopts the update delayed method to make the actor training more stable. In order to test the effectiveness of the algorithm structure, the modified method is compared with the traditional MADDPG, DDPG and DQN methods in the simulation environment.",

keywords = "Deep learning, multi-agent system, neural networks, overestimation, Reinforcement learning",

author = "Mengying Zhan and Jinchao Chen and Chenglie Du and Yuxin Duan",

note = "Publisher Copyright: {\textcopyright} 2021 IEEE.; 8th IEEE International Conference on Progress in Informatics and Computing, PIC 2021 ; Conference date: 17-12-2021 Through 19-12-2021",

year = "2021",

doi = "10.1109/PIC53636.2021.9687069",

language = "英语",

series = "Proceedings of the 2021 IEEE International Conference on Progress in Informatics and Computing, PIC 2021",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "48--52",

editor = "Yinglin Wang and Zheying Zhang",

booktitle = "Proceedings of the 2021 IEEE International Conference on Progress in Informatics and Computing, PIC 2021",

}

Zhan, M, Chen, J , Du, C & Duan, Y 2021, Twin Delayed Multi-Agent Deep Deterministic Policy Gradient. in Y Wang & Z Zhang (eds), Proceedings of the 2021 IEEE International Conference on Progress in Informatics and Computing, PIC 2021. Proceedings of the 2021 IEEE International Conference on Progress in Informatics and Computing, PIC 2021, Institute of Electrical and Electronics Engineers Inc., pp. 48-52, 8th IEEE International Conference on Progress in Informatics and Computing, PIC 2021, Virtual, Online, China, 17/12/21. https://doi.org/10.1109/PIC53636.2021.9687069

Twin Delayed Multi-Agent Deep Deterministic Policy Gradient. / Zhan, Mengying; Chen, Jinchao ; Du, Chenglie et al.
Proceedings of the 2021 IEEE International Conference on Progress in Informatics and Computing, PIC 2021. ed. / Yinglin Wang; Zheying Zhang. Institute of Electrical and Electronics Engineers Inc., 2021. p. 48-52 (Proceedings of the 2021 IEEE International Conference on Progress in Informatics and Computing, PIC 2021).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Twin Delayed Multi-Agent Deep Deterministic Policy Gradient

AU - Zhan, Mengying

AU - Chen, Jinchao

AU - Du, Chenglie

AU - Duan, Yuxin

PY - 2021

Y1 - 2021

N2 - Recently, reinforcement learning has made remarkable achievements in the fields of natural science, engineering, medicine and operational research. Reinforcement learning addresses sequence problems and considers long-term returns. This long-term view of reinforcement learning is critical to find the optimal solution of many problems. The existing multi- agent reinforcement learning algorithms have the problem of overestimation in estimating the Q value. Unfortunately, there have not been many studies on overestimation of agent reinforcement learning, which will affect the learning efficiency of reinforcement learning. Based on the traditional multi-agent reinforcement learning algorithm, this paper improves the actor network and critic network, optimizes the overestimation of Q value and adopts the update delayed method to make the actor training more stable. In order to test the effectiveness of the algorithm structure, the modified method is compared with the traditional MADDPG, DDPG and DQN methods in the simulation environment.

AB - Recently, reinforcement learning has made remarkable achievements in the fields of natural science, engineering, medicine and operational research. Reinforcement learning addresses sequence problems and considers long-term returns. This long-term view of reinforcement learning is critical to find the optimal solution of many problems. The existing multi- agent reinforcement learning algorithms have the problem of overestimation in estimating the Q value. Unfortunately, there have not been many studies on overestimation of agent reinforcement learning, which will affect the learning efficiency of reinforcement learning. Based on the traditional multi-agent reinforcement learning algorithm, this paper improves the actor network and critic network, optimizes the overestimation of Q value and adopts the update delayed method to make the actor training more stable. In order to test the effectiveness of the algorithm structure, the modified method is compared with the traditional MADDPG, DDPG and DQN methods in the simulation environment.

KW - Deep learning

KW - multi-agent system

KW - neural networks

KW - overestimation

KW - Reinforcement learning

UR - http://www.scopus.com/inward/record.url?scp=85125814829&partnerID=8YFLogxK

U2 - 10.1109/PIC53636.2021.9687069

DO - 10.1109/PIC53636.2021.9687069

M3 - 会议稿件

AN - SCOPUS:85125814829

T3 - Proceedings of the 2021 IEEE International Conference on Progress in Informatics and Computing, PIC 2021

SP - 48

EP - 52

BT - Proceedings of the 2021 IEEE International Conference on Progress in Informatics and Computing, PIC 2021

A2 - Wang, Yinglin

A2 - Zhang, Zheying

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 8th IEEE International Conference on Progress in Informatics and Computing, PIC 2021

Y2 - 17 December 2021 through 19 December 2021

ER -

Zhan M, Chen J , Du C, Duan Y. Twin Delayed Multi-Agent Deep Deterministic Policy Gradient. In Wang Y, Zhang Z, editors, Proceedings of the 2021 IEEE International Conference on Progress in Informatics and Computing, PIC 2021. Institute of Electrical and Electronics Engineers Inc. 2021. p. 48-52. (Proceedings of the 2021 IEEE International Conference on Progress in Informatics and Computing, PIC 2021). doi: 10.1109/PIC53636.2021.9687069

Twin Delayed Multi-Agent Deep Deterministic Policy Gradient

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this