An Improved Prioritized DDPG Based on Fractional-Order Learning Scheme

Quan Yong Fan; Meiying Cai; Bin Xu

doi:10.1109/TNNLS.2024.3395508

An Improved Prioritized DDPG Based on Fractional-Order Learning Scheme

Quan Yong Fan, Meiying Cai, Bin Xu

School of Automation

Northwestern Polytechnical University Xian

Research output: Contribution to journal › Article › peer-review

12 Scopus citations

Abstract

Although deep deterministic policy gradient (DDPG) algorithm gets widespread attention as a result of its powerful functionality and applicability for large-scale continuous control, it cannot be denied that DDPG has problems such as low sample utilization efficiency and insufficient exploration. Therefore, an improved DDPG is presented to overcome these challenges in this article. Firstly, an optimizer based on fractional gradient is introduced into the algorithm network, which is conductive to increase the speed and accuracy of training convergence. On this basis, high-value experience replay based on weight-changed priority is proposed to improve sample utilization efficiency, and aiming to have a stronger exploration of the environment, an optimized exploration strategy for boundary action space is adopted. Finally, our proposed method is tested through the experiments of gym and pybullet platform. According to the results, our method speeds up the learning process, obtains higher average rewards in comparison with other algorithms.

Original language	English
Pages (from-to)	6873-6882
Number of pages	10
Journal	IEEE Transactions on Neural Networks and Learning Systems
Volume	36
Issue number	4
DOIs	https://doi.org/10.1109/TNNLS.2024.3395508
State	Published - 2025

Keywords

Deep deterministic policy gradient (DDPG)
fractional-order gradient
prioritized experience replay
reinforcement learning (RL)

Access to Document

10.1109/TNNLS.2024.3395508

Cite this

@article{f4507751dac0408595b81b39a653d8ee,

title = "An Improved Prioritized DDPG Based on Fractional-Order Learning Scheme",

abstract = "Although deep deterministic policy gradient (DDPG) algorithm gets widespread attention as a result of its powerful functionality and applicability for large-scale continuous control, it cannot be denied that DDPG has problems such as low sample utilization efficiency and insufficient exploration. Therefore, an improved DDPG is presented to overcome these challenges in this article. Firstly, an optimizer based on fractional gradient is introduced into the algorithm network, which is conductive to increase the speed and accuracy of training convergence. On this basis, high-value experience replay based on weight-changed priority is proposed to improve sample utilization efficiency, and aiming to have a stronger exploration of the environment, an optimized exploration strategy for boundary action space is adopted. Finally, our proposed method is tested through the experiments of gym and pybullet platform. According to the results, our method speeds up the learning process, obtains higher average rewards in comparison with other algorithms.",

keywords = "Deep deterministic policy gradient (DDPG), fractional-order gradient, prioritized experience replay, reinforcement learning (RL)",

author = "Fan, {Quan Yong} and Meiying Cai and Bin Xu",

note = "Publisher Copyright: {\textcopyright} 2012 IEEE.",

year = "2025",

doi = "10.1109/TNNLS.2024.3395508",

language = "英语",

volume = "36",

pages = "6873--6882",

journal = "IEEE Transactions on Neural Networks and Learning Systems",

issn = "2162-237X",

publisher = "IEEE Computational Intelligence Society",

number = "4",

}

TY - JOUR

T1 - An Improved Prioritized DDPG Based on Fractional-Order Learning Scheme

AU - Fan, Quan Yong

AU - Cai, Meiying

AU - Xu, Bin

PY - 2025

Y1 - 2025

N2 - Although deep deterministic policy gradient (DDPG) algorithm gets widespread attention as a result of its powerful functionality and applicability for large-scale continuous control, it cannot be denied that DDPG has problems such as low sample utilization efficiency and insufficient exploration. Therefore, an improved DDPG is presented to overcome these challenges in this article. Firstly, an optimizer based on fractional gradient is introduced into the algorithm network, which is conductive to increase the speed and accuracy of training convergence. On this basis, high-value experience replay based on weight-changed priority is proposed to improve sample utilization efficiency, and aiming to have a stronger exploration of the environment, an optimized exploration strategy for boundary action space is adopted. Finally, our proposed method is tested through the experiments of gym and pybullet platform. According to the results, our method speeds up the learning process, obtains higher average rewards in comparison with other algorithms.

AB - Although deep deterministic policy gradient (DDPG) algorithm gets widespread attention as a result of its powerful functionality and applicability for large-scale continuous control, it cannot be denied that DDPG has problems such as low sample utilization efficiency and insufficient exploration. Therefore, an improved DDPG is presented to overcome these challenges in this article. Firstly, an optimizer based on fractional gradient is introduced into the algorithm network, which is conductive to increase the speed and accuracy of training convergence. On this basis, high-value experience replay based on weight-changed priority is proposed to improve sample utilization efficiency, and aiming to have a stronger exploration of the environment, an optimized exploration strategy for boundary action space is adopted. Finally, our proposed method is tested through the experiments of gym and pybullet platform. According to the results, our method speeds up the learning process, obtains higher average rewards in comparison with other algorithms.

KW - Deep deterministic policy gradient (DDPG)

KW - fractional-order gradient

KW - prioritized experience replay

KW - reinforcement learning (RL)

UR - http://www.scopus.com/inward/record.url?scp=105002298514&partnerID=8YFLogxK

U2 - 10.1109/TNNLS.2024.3395508

DO - 10.1109/TNNLS.2024.3395508

M3 - 文章

AN - SCOPUS:105002298514

SN - 2162-237X

VL - 36

SP - 6873

EP - 6882

JO - IEEE Transactions on Neural Networks and Learning Systems

JF - IEEE Transactions on Neural Networks and Learning Systems

IS - 4

ER -

An Improved Prioritized DDPG Based on Fractional-Order Learning Scheme

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this