TY - JOUR
T1 - Multi-UAV Rendezvous Trajectory Planning Based on Improved MADDPG Algorithm in Complex Dynamic Obstacle Environments
AU - Xing, Xiaojun
AU - Ma, Yuanqiang
AU - Lei, Yichen
AU - Li, Yan
AU - Xiao, Bing
N1 - Publisher Copyright:
© 1967-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - Traditional trajectory planning algorithms for multi-UAVs face challenges such as difficulty in establishing cooperative mechanisms and poor adaptability to dynamic obstacle environments. To address these limitations, an enhanced reinforcement learning algorithm, based on the multi-agent deep deterministic policy gradient algorithm (MADDPG) and attention mechanism, is proposed for multi-UAV rendezvous trajectory planning in unknown complex environments. Firstly, the algorithm innovatively introduces an attention mechanism in deep learning into the centralized critic network of the MADDPG, enabling the model to dynamically adjust attention in complex environments and enhance learning efficiency; secondly, a dense reward function model based on guiding points is developed, combining attractive and repulsive forces, effectively addressing the issue of sparse rewards, accelerating the algorithm's convergence rate, and bettering policy learning efficiency; thirdly, an Ornstein-Uhlenbeck (OU) noise network is incorporated to well balance exploration and exploitation during the training process; finally, in the static obstacle environment, dynamic obstacle environment and extended composite scenarios, this algorithm was compared with MADDPG, MATD3, and IDDPG. The results show that the improved algorithm can effectively avoid collisions, successfully rendezvous at the target point, and achieve the minimum decision steps, the shortest trajectory length and the highest rendezvous success rate. Especially in scenarios with multiple dynamic obstacles, the improved algorithm can adjust the UAV flight path in real-time and successfully avoid all dynamic obstacles.
AB - Traditional trajectory planning algorithms for multi-UAVs face challenges such as difficulty in establishing cooperative mechanisms and poor adaptability to dynamic obstacle environments. To address these limitations, an enhanced reinforcement learning algorithm, based on the multi-agent deep deterministic policy gradient algorithm (MADDPG) and attention mechanism, is proposed for multi-UAV rendezvous trajectory planning in unknown complex environments. Firstly, the algorithm innovatively introduces an attention mechanism in deep learning into the centralized critic network of the MADDPG, enabling the model to dynamically adjust attention in complex environments and enhance learning efficiency; secondly, a dense reward function model based on guiding points is developed, combining attractive and repulsive forces, effectively addressing the issue of sparse rewards, accelerating the algorithm's convergence rate, and bettering policy learning efficiency; thirdly, an Ornstein-Uhlenbeck (OU) noise network is incorporated to well balance exploration and exploitation during the training process; finally, in the static obstacle environment, dynamic obstacle environment and extended composite scenarios, this algorithm was compared with MADDPG, MATD3, and IDDPG. The results show that the improved algorithm can effectively avoid collisions, successfully rendezvous at the target point, and achieve the minimum decision steps, the shortest trajectory length and the highest rendezvous success rate. Especially in scenarios with multiple dynamic obstacles, the improved algorithm can adjust the UAV flight path in real-time and successfully avoid all dynamic obstacles.
KW - Multi-UAV trajectory planning
KW - attention mechanism
KW - dense reward
KW - reinforcement learning
UR - https://www.scopus.com/pages/publications/105020312714
U2 - 10.1109/TVT.2025.3624052
DO - 10.1109/TVT.2025.3624052
M3 - 文章
AN - SCOPUS:105020312714
SN - 0018-9545
JO - IEEE Transactions on Vehicular Technology
JF - IEEE Transactions on Vehicular Technology
ER -