An Improved Method towards Multi-UAV Autonomous Navigation Using Deep Reinforcement Learning

Dingwei Wu, Kaifang Wan, Jianqiang Tang, Xiaoguang Gao, Yiwei Zhai, Zhaohui Qi

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

11 Scopus citations

Abstract

Autonomous navigation is a key technology of multi-UAV systems, and deep reinforcement learning can endow UAVs with powerful autonomous decision-making capabilities. To improve the convergence speed and stability of reinforcement learning, this paper proposes a multi-agent deep deterministic policy gradient algorithm based on prioritized experience replay, namely PER-MADDPG. This algorithm makes the samples with higher priority have a higher probability of being chosen for the parameter update, which can speed up the algorithm convergence. Moreover, the actions of UAVs are generated utilizing parameter noise, which can improve the stability and robustness of the algorithm. Experiments show that PER-MADDPG has fast convergence speed and good convergence results, and has excellent autonomous navigation capabilities.

Original languageEnglish
Title of host publication2022 7th International Conference on Control and Robotics Engineering, ICCRE 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages96-101
Number of pages6
ISBN (Electronic)9781665468404
DOIs
StatePublished - 2022
Event7th International Conference on Control and Robotics Engineering, ICCRE 2022 - Beijing, China
Duration: 15 Apr 202217 Apr 2022

Publication series

Name2022 7th International Conference on Control and Robotics Engineering, ICCRE 2022

Conference

Conference7th International Conference on Control and Robotics Engineering, ICCRE 2022
Country/TerritoryChina
CityBeijing
Period15/04/2217/04/22

Keywords

  • autonomous navigation
  • MADDPG
  • multi-UAV
  • prioritized experience replay
  • reinforcement learning

Fingerprint

Dive into the research topics of 'An Improved Method towards Multi-UAV Autonomous Navigation Using Deep Reinforcement Learning'. Together they form a unique fingerprint.

Cite this