Multi-UAV Adaptive Cooperative Formation Trajectory Planning Based on an Improved MATD3 Algorithm of Deep Reinforcement Learning

Xiaojun Xing; Zhiwei Zhou; Yan Li; Bing Xiao; Yilin Xun

doi:10.1109/TVT.2024.3389555

Multi-UAV Adaptive Cooperative Formation Trajectory Planning Based on an Improved MATD3 Algorithm of Deep Reinforcement Learning

Xiaojun Xing, Zhiwei Zhou, Yan Li, Bing Xiao, Yilin Xun

自动化学院

Northwestern Polytechnical University Xian

科研成果: 期刊稿件 › 文章 › 同行评审

21 引用（Scopus）

摘要

Multi-unmanned aerial vehicle (multi-UAV) cooperative trajectory planning is an extremely challenging issue in UAV research field due to its NP-hard characteristic, collision avoiding constraints, close formation requirement, consensus convergence and high-dimensional action space etc. Especially, the difficulty of multi-UAV trajectory planning will boost comparatively when there are complex obstacles and narrow passages in unknown environments. Accordingly, a novel multi-UAV adaptive cooperative formation trajectory planning approach is proposed in this article based on an improved deep reinforcement learning algorithm in unknown obstacle environments, which innovatively introduces long short-Term memory (LSTM) recurrent neural network (RNN) into the environment perception end of multi-Agent twin delayed deep deterministic policy gradient (MATD3) network, and develops an improved potential field-based dense reward function to strengthen the policy learning efficiency and accelerates the convergence respectively. Moreover, a hierarchical deep reinforcement learning training mechanism, including adaptive formation layer, trajectory planning layer and action execution layer is implemented to explore an optimal trajectory planning policy. Additionally, an adaptive formation maintaining and transformation strategy is presented for UAV swarm to comply with the environment with narrow passages. Simulation results show that the proposed approach is better in policy learning efficiency, optimality of trajectory planning policy and adaptability to narrow passages than that using multi-Agent deep deterministic policy gradient (MADDPG) and MATD3.

源语言	英语
页（从-至）	12484-12499
页数	16
期刊	IEEE Transactions on Vehicular Technology
卷	73
期	9
DOI	https://doi.org/10.1109/TVT.2024.3389555
出版状态	已出版 - 2024

访问文件

10.1109/TVT.2024.3389555

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{d239c1a2260e41e2956937abe1ab174c,

title = "Multi-UAV Adaptive Cooperative Formation Trajectory Planning Based on an Improved MATD3 Algorithm of Deep Reinforcement Learning",

abstract = "Multi-unmanned aerial vehicle (multi-UAV) cooperative trajectory planning is an extremely challenging issue in UAV research field due to its NP-hard characteristic, collision avoiding constraints, close formation requirement, consensus convergence and high-dimensional action space etc. Especially, the difficulty of multi-UAV trajectory planning will boost comparatively when there are complex obstacles and narrow passages in unknown environments. Accordingly, a novel multi-UAV adaptive cooperative formation trajectory planning approach is proposed in this article based on an improved deep reinforcement learning algorithm in unknown obstacle environments, which innovatively introduces long short-Term memory (LSTM) recurrent neural network (RNN) into the environment perception end of multi-Agent twin delayed deep deterministic policy gradient (MATD3) network, and develops an improved potential field-based dense reward function to strengthen the policy learning efficiency and accelerates the convergence respectively. Moreover, a hierarchical deep reinforcement learning training mechanism, including adaptive formation layer, trajectory planning layer and action execution layer is implemented to explore an optimal trajectory planning policy. Additionally, an adaptive formation maintaining and transformation strategy is presented for UAV swarm to comply with the environment with narrow passages. Simulation results show that the proposed approach is better in policy learning efficiency, optimality of trajectory planning policy and adaptability to narrow passages than that using multi-Agent deep deterministic policy gradient (MADDPG) and MATD3.",

keywords = "adaptive formation strategy, deep reinforcement learning, hierarchical training mechanism, Multi-unmanned aerial vehicle (multi-UAV) cooperative formation trajectory planning, potential field-based dense reward",

author = "Xiaojun Xing and Zhiwei Zhou and Yan Li and Bing Xiao and Yilin Xun",

note = "Publisher Copyright: {\textcopyright} 1967-2012 IEEE.",

year = "2024",

doi = "10.1109/TVT.2024.3389555",

language = "英语",

volume = "73",

pages = "12484--12499",

journal = "IEEE Transactions on Vehicular Technology",

issn = "0018-9545",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "9",

}

TY - JOUR

T1 - Multi-UAV Adaptive Cooperative Formation Trajectory Planning Based on an Improved MATD3 Algorithm of Deep Reinforcement Learning

AU - Xing, Xiaojun

AU - Zhou, Zhiwei

AU - Li, Yan

AU - Xiao, Bing

AU - Xun, Yilin

PY - 2024

Y1 - 2024

N2 - Multi-unmanned aerial vehicle (multi-UAV) cooperative trajectory planning is an extremely challenging issue in UAV research field due to its NP-hard characteristic, collision avoiding constraints, close formation requirement, consensus convergence and high-dimensional action space etc. Especially, the difficulty of multi-UAV trajectory planning will boost comparatively when there are complex obstacles and narrow passages in unknown environments. Accordingly, a novel multi-UAV adaptive cooperative formation trajectory planning approach is proposed in this article based on an improved deep reinforcement learning algorithm in unknown obstacle environments, which innovatively introduces long short-Term memory (LSTM) recurrent neural network (RNN) into the environment perception end of multi-Agent twin delayed deep deterministic policy gradient (MATD3) network, and develops an improved potential field-based dense reward function to strengthen the policy learning efficiency and accelerates the convergence respectively. Moreover, a hierarchical deep reinforcement learning training mechanism, including adaptive formation layer, trajectory planning layer and action execution layer is implemented to explore an optimal trajectory planning policy. Additionally, an adaptive formation maintaining and transformation strategy is presented for UAV swarm to comply with the environment with narrow passages. Simulation results show that the proposed approach is better in policy learning efficiency, optimality of trajectory planning policy and adaptability to narrow passages than that using multi-Agent deep deterministic policy gradient (MADDPG) and MATD3.

AB - Multi-unmanned aerial vehicle (multi-UAV) cooperative trajectory planning is an extremely challenging issue in UAV research field due to its NP-hard characteristic, collision avoiding constraints, close formation requirement, consensus convergence and high-dimensional action space etc. Especially, the difficulty of multi-UAV trajectory planning will boost comparatively when there are complex obstacles and narrow passages in unknown environments. Accordingly, a novel multi-UAV adaptive cooperative formation trajectory planning approach is proposed in this article based on an improved deep reinforcement learning algorithm in unknown obstacle environments, which innovatively introduces long short-Term memory (LSTM) recurrent neural network (RNN) into the environment perception end of multi-Agent twin delayed deep deterministic policy gradient (MATD3) network, and develops an improved potential field-based dense reward function to strengthen the policy learning efficiency and accelerates the convergence respectively. Moreover, a hierarchical deep reinforcement learning training mechanism, including adaptive formation layer, trajectory planning layer and action execution layer is implemented to explore an optimal trajectory planning policy. Additionally, an adaptive formation maintaining and transformation strategy is presented for UAV swarm to comply with the environment with narrow passages. Simulation results show that the proposed approach is better in policy learning efficiency, optimality of trajectory planning policy and adaptability to narrow passages than that using multi-Agent deep deterministic policy gradient (MADDPG) and MATD3.

KW - adaptive formation strategy

KW - deep reinforcement learning

KW - hierarchical training mechanism

KW - Multi-unmanned aerial vehicle (multi-UAV) cooperative formation trajectory planning

KW - potential field-based dense reward

UR - http://www.scopus.com/inward/record.url?scp=85190745304&partnerID=8YFLogxK

U2 - 10.1109/TVT.2024.3389555

DO - 10.1109/TVT.2024.3389555

M3 - 文章

AN - SCOPUS:85190745304

SN - 0018-9545

VL - 73

SP - 12484

EP - 12499

JO - IEEE Transactions on Vehicular Technology

JF - IEEE Transactions on Vehicular Technology

IS - 9

ER -

Multi-UAV Adaptive Cooperative Formation Trajectory Planning Based on an Improved MATD3 Algorithm of Deep Reinforcement Learning

摘要

访问文件

其它文件与链接

指纹

引用此