An Improved Prioritized DDPG Based on Fractional-Order Learning Scheme

Quan Yong Fan, Meiying Cai, Bin Xu

Research output: Contribution to journalArticlepeer-review

11 Scopus citations

Abstract

Although deep deterministic policy gradient (DDPG) algorithm gets widespread attention as a result of its powerful functionality and applicability for large-scale continuous control, it cannot be denied that DDPG has problems such as low sample utilization efficiency and insufficient exploration. Therefore, an improved DDPG is presented to overcome these challenges in this article. Firstly, an optimizer based on fractional gradient is introduced into the algorithm network, which is conductive to increase the speed and accuracy of training convergence. On this basis, high-value experience replay based on weight-changed priority is proposed to improve sample utilization efficiency, and aiming to have a stronger exploration of the environment, an optimized exploration strategy for boundary action space is adopted. Finally, our proposed method is tested through the experiments of gym and pybullet platform. According to the results, our method speeds up the learning process, obtains higher average rewards in comparison with other algorithms.

Original languageEnglish
Pages (from-to)6873-6882
Number of pages10
JournalIEEE Transactions on Neural Networks and Learning Systems
Volume36
Issue number4
DOIs
StatePublished - 2025

Keywords

  • Deep deterministic policy gradient (DDPG)
  • fractional-order gradient
  • prioritized experience replay
  • reinforcement learning (RL)

Fingerprint

Dive into the research topics of 'An Improved Prioritized DDPG Based on Fractional-Order Learning Scheme'. Together they form a unique fingerprint.

Cite this