TY - JOUR
T1 - Expert System-Based Multiagent Deep Deterministic Policy Gradient for Swarm Robot Decision Making
AU - Wang, Zhen
AU - Jin, Xiaoyue
AU - Zhang, Tao
AU - Li, Jiahao
AU - Yu, Dengxiu
AU - Cheong, Kang Hao
AU - Chen, C. L.Philip
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2024/3/1
Y1 - 2024/3/1
N2 - In this article, an expert system-based multiagent deep deterministic policy gradient (ESB-MADDPG) is proposed to realize the decision making for swarm robots. Multiagent deep deterministic policy gradient (MADDPG) is a multiagent reinforcement learning algorithm proposed to utilize a centralized critic within the actor-critic learning framework, which can reduce policy gradient variance. However, it is difficult to apply traditional MADDPG to swarm robots directly as it is time consuming during the path planning, rendering it necessary to propose a faster method to gather the trajectories. Besides, the trajectories obtained by the MADDPG are continuous by straight lines, which is not smooth and will be difficult for the swarm robots to track. This article aims to solve these problems by closing the above gaps. First, the ESB-MADDPG method is proposed to improve the training speed. The smooth processing of the trajectory is designed in the ESB-MADDPG. Furthermore, the expert system also provides us with many trained offline trajectories, which avoid the retraining each time we use the swarm robots. Considering the gathered trajectories, the model predictive control (MPC) algorithm is introduced to realize the optimal tracking of the offline trajectories. Simulation results show that combining ESB-MADDPG and MPC can realize swarm robot decision making efficiently.
AB - In this article, an expert system-based multiagent deep deterministic policy gradient (ESB-MADDPG) is proposed to realize the decision making for swarm robots. Multiagent deep deterministic policy gradient (MADDPG) is a multiagent reinforcement learning algorithm proposed to utilize a centralized critic within the actor-critic learning framework, which can reduce policy gradient variance. However, it is difficult to apply traditional MADDPG to swarm robots directly as it is time consuming during the path planning, rendering it necessary to propose a faster method to gather the trajectories. Besides, the trajectories obtained by the MADDPG are continuous by straight lines, which is not smooth and will be difficult for the swarm robots to track. This article aims to solve these problems by closing the above gaps. First, the ESB-MADDPG method is proposed to improve the training speed. The smooth processing of the trajectory is designed in the ESB-MADDPG. Furthermore, the expert system also provides us with many trained offline trajectories, which avoid the retraining each time we use the swarm robots. Considering the gathered trajectories, the model predictive control (MPC) algorithm is introduced to realize the optimal tracking of the offline trajectories. Simulation results show that combining ESB-MADDPG and MPC can realize swarm robot decision making efficiently.
KW - Model prediction control
KW - multiagent deep deterministic policy gradient (MADDPG)
KW - swarm robot decision making
UR - http://www.scopus.com/inward/record.url?scp=85146249164&partnerID=8YFLogxK
U2 - 10.1109/TCYB.2022.3228578
DO - 10.1109/TCYB.2022.3228578
M3 - 文章
C2 - 37015659
AN - SCOPUS:85146249164
SN - 2168-2267
VL - 54
SP - 1614
EP - 1624
JO - IEEE Transactions on Cybernetics
JF - IEEE Transactions on Cybernetics
IS - 3
ER -