ME-MADDPG: An efficient learning-based motion planning method for multiple agents in complex environments

Kaifang Wan; Dingwei Wu; Bo Li; Xiaoguang Gao; Zijian Hu; Daqing Chen

doi:10.1002/int.22778

ME-MADDPG: An efficient learning-based motion planning method for multiple agents in complex environments

Kaifang Wan, Dingwei Wu, Bo Li, Xiaoguang Gao, Zijian Hu, Daqing Chen

电子信息学院

科研成果: 期刊稿件 › 文章 › 同行评审

38 引用（Scopus）

摘要

Developing efficient motion policies for multiagents is a challenge in a decentralized dynamic situation, where each agent plans its own paths without knowing the policies of the other agents involved. This paper presents an efficient learning-based motion planning method for multiagent systems. It adopts the framework of multiagent deep deterministic policy gradient (MADDPG) to directly map partially observed information to motion commands for multiple agents. To improve the efficiency of MADDPG in sample utilization, so as to train more brilliant agents that can adapt to more complex environments, a strategy named mixed experience (ME) is introduced to MADDPG, and this has led to our proposed ME-MADDPG algorithm. The novel ME strategy can be embodied into three specific mechanisms: (1) an artificial potential field-based sample generator to produce high-quality samples in the early training stage; (2) a dynamic mixed sampling strategy to mix the training data from different sources with a variable proportion; (3) a delayed learning skill to stabilize the training of the multiple agents. A series of experiments have been conducted to verify the performance of the proposed ME-MADDPG algorithm, and it has been demonstrated that, compared with MADDPG, the proposed algorithm can significantly improve the convergence speed and convergence effect in the training process, and it has also shown better efficiency and better adaptability in complex dynamic environments while it is used for multiagent motion planning applications.

源语言	英语
页（从-至）	2393-2427
页数	35
期刊	International Journal of Intelligent Systems
卷	37
期	3
DOI	https://doi.org/10.1002/int.22778
出版状态	已出版 - 3月 2022

访问文件

10.1002/int.22778

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{71c63fa18a344fc292aecf1fef0455b4,

title = "ME-MADDPG: An efficient learning-based motion planning method for multiple agents in complex environments",

abstract = "Developing efficient motion policies for multiagents is a challenge in a decentralized dynamic situation, where each agent plans its own paths without knowing the policies of the other agents involved. This paper presents an efficient learning-based motion planning method for multiagent systems. It adopts the framework of multiagent deep deterministic policy gradient (MADDPG) to directly map partially observed information to motion commands for multiple agents. To improve the efficiency of MADDPG in sample utilization, so as to train more brilliant agents that can adapt to more complex environments, a strategy named mixed experience (ME) is introduced to MADDPG, and this has led to our proposed ME-MADDPG algorithm. The novel ME strategy can be embodied into three specific mechanisms: (1) an artificial potential field-based sample generator to produce high-quality samples in the early training stage; (2) a dynamic mixed sampling strategy to mix the training data from different sources with a variable proportion; (3) a delayed learning skill to stabilize the training of the multiple agents. A series of experiments have been conducted to verify the performance of the proposed ME-MADDPG algorithm, and it has been demonstrated that, compared with MADDPG, the proposed algorithm can significantly improve the convergence speed and convergence effect in the training process, and it has also shown better efficiency and better adaptability in complex dynamic environments while it is used for multiagent motion planning applications.",

author = "Kaifang Wan and Dingwei Wu and Bo Li and Xiaoguang Gao and Zijian Hu and Daqing Chen",

note = "Publisher Copyright: {\textcopyright} 2021 Wiley Periodicals LLC",

year = "2022",

month = mar,

doi = "10.1002/int.22778",

language = "英语",

volume = "37",

pages = "2393--2427",

journal = "International Journal of Intelligent Systems",

issn = "0884-8173",

publisher = "John Wiley and Sons Inc",

number = "3",

}

TY - JOUR

T1 - ME-MADDPG

T2 - An efficient learning-based motion planning method for multiple agents in complex environments

AU - Wan, Kaifang

AU - Wu, Dingwei

AU - Li, Bo

AU - Gao, Xiaoguang

AU - Hu, Zijian

AU - Chen, Daqing

PY - 2022/3

Y1 - 2022/3

N2 - Developing efficient motion policies for multiagents is a challenge in a decentralized dynamic situation, where each agent plans its own paths without knowing the policies of the other agents involved. This paper presents an efficient learning-based motion planning method for multiagent systems. It adopts the framework of multiagent deep deterministic policy gradient (MADDPG) to directly map partially observed information to motion commands for multiple agents. To improve the efficiency of MADDPG in sample utilization, so as to train more brilliant agents that can adapt to more complex environments, a strategy named mixed experience (ME) is introduced to MADDPG, and this has led to our proposed ME-MADDPG algorithm. The novel ME strategy can be embodied into three specific mechanisms: (1) an artificial potential field-based sample generator to produce high-quality samples in the early training stage; (2) a dynamic mixed sampling strategy to mix the training data from different sources with a variable proportion; (3) a delayed learning skill to stabilize the training of the multiple agents. A series of experiments have been conducted to verify the performance of the proposed ME-MADDPG algorithm, and it has been demonstrated that, compared with MADDPG, the proposed algorithm can significantly improve the convergence speed and convergence effect in the training process, and it has also shown better efficiency and better adaptability in complex dynamic environments while it is used for multiagent motion planning applications.

AB - Developing efficient motion policies for multiagents is a challenge in a decentralized dynamic situation, where each agent plans its own paths without knowing the policies of the other agents involved. This paper presents an efficient learning-based motion planning method for multiagent systems. It adopts the framework of multiagent deep deterministic policy gradient (MADDPG) to directly map partially observed information to motion commands for multiple agents. To improve the efficiency of MADDPG in sample utilization, so as to train more brilliant agents that can adapt to more complex environments, a strategy named mixed experience (ME) is introduced to MADDPG, and this has led to our proposed ME-MADDPG algorithm. The novel ME strategy can be embodied into three specific mechanisms: (1) an artificial potential field-based sample generator to produce high-quality samples in the early training stage; (2) a dynamic mixed sampling strategy to mix the training data from different sources with a variable proportion; (3) a delayed learning skill to stabilize the training of the multiple agents. A series of experiments have been conducted to verify the performance of the proposed ME-MADDPG algorithm, and it has been demonstrated that, compared with MADDPG, the proposed algorithm can significantly improve the convergence speed and convergence effect in the training process, and it has also shown better efficiency and better adaptability in complex dynamic environments while it is used for multiagent motion planning applications.

UR - http://www.scopus.com/inward/record.url?scp=85120897703&partnerID=8YFLogxK

U2 - 10.1002/int.22778

DO - 10.1002/int.22778

M3 - 文章

AN - SCOPUS:85120897703

SN - 0884-8173

VL - 37

SP - 2393

EP - 2427

JO - International Journal of Intelligent Systems

JF - International Journal of Intelligent Systems

IS - 3

ER -

ME-MADDPG: An efficient learning-based motion planning method for multiple agents in complex environments

摘要

访问文件

其它文件与链接

指纹

引用此