Multi-UAV Rendezvous Trajectory Planning Based on Improved MADDPG Algorithm in Complex Dynamic Obstacle Environments

  • Xiaojun Xing
  • , Yuanqiang Ma
  • , Yichen Lei
  • , Yan Li
  • , Bing Xiao

Research output: Contribution to journalArticlepeer-review

Abstract

Traditional trajectory planning algorithms for multi-UAVs face challenges such as difficulty in establishing cooperative mechanisms and poor adaptability to dynamic obstacle environments. To address these limitations, an enhanced reinforcement learning algorithm, based on the multi-agent deep deterministic policy gradient algorithm (MADDPG) and attention mechanism, is proposed for multi-UAV rendezvous trajectory planning in unknown complex environments. Firstly, the algorithm innovatively introduces an attention mechanism in deep learning into the centralized critic network of the MADDPG, enabling the model to dynamically adjust attention in complex environments and enhance learning efficiency; secondly, a dense reward function model based on guiding points is developed, combining attractive and repulsive forces, effectively addressing the issue of sparse rewards, accelerating the algorithm's convergence rate, and bettering policy learning efficiency; thirdly, an Ornstein-Uhlenbeck (OU) noise network is incorporated to well balance exploration and exploitation during the training process; finally, in the static obstacle environment, dynamic obstacle environment and extended composite scenarios, this algorithm was compared with MADDPG, MATD3, and IDDPG. The results show that the improved algorithm can effectively avoid collisions, successfully rendezvous at the target point, and achieve the minimum decision steps, the shortest trajectory length and the highest rendezvous success rate. Especially in scenarios with multiple dynamic obstacles, the improved algorithm can adjust the UAV flight path in real-time and successfully avoid all dynamic obstacles.

Original languageEnglish
JournalIEEE Transactions on Vehicular Technology
DOIs
StateAccepted/In press - 2025

Keywords

  • Multi-UAV trajectory planning
  • attention mechanism
  • dense reward
  • reinforcement learning

Fingerprint

Dive into the research topics of 'Multi-UAV Rendezvous Trajectory Planning Based on Improved MADDPG Algorithm in Complex Dynamic Obstacle Environments'. Together they form a unique fingerprint.

Cite this