An Improved Prioritized DDPG Based on Fractional-Order Learning Scheme

Quan Yong Fan; Meiying Cai; Bin Xu

doi:10.1109/TNNLS.2024.3395508

An Improved Prioritized DDPG Based on Fractional-Order Learning Scheme

Quan Yong Fan, Meiying Cai, Bin Xu

自动化学院

Northwestern Polytechnical University Xian

科研成果: 期刊稿件 › 文章 › 同行评审

12 引用（Scopus）

摘要

Although deep deterministic policy gradient (DDPG) algorithm gets widespread attention as a result of its powerful functionality and applicability for large-scale continuous control, it cannot be denied that DDPG has problems such as low sample utilization efficiency and insufficient exploration. Therefore, an improved DDPG is presented to overcome these challenges in this article. Firstly, an optimizer based on fractional gradient is introduced into the algorithm network, which is conductive to increase the speed and accuracy of training convergence. On this basis, high-value experience replay based on weight-changed priority is proposed to improve sample utilization efficiency, and aiming to have a stronger exploration of the environment, an optimized exploration strategy for boundary action space is adopted. Finally, our proposed method is tested through the experiments of gym and pybullet platform. According to the results, our method speeds up the learning process, obtains higher average rewards in comparison with other algorithms.

源语言	英语
页（从-至）	6873-6882
页数	10
期刊	IEEE Transactions on Neural Networks and Learning Systems
卷	36
期	4
DOI	https://doi.org/10.1109/TNNLS.2024.3395508
出版状态	已出版 - 2025

访问文件

10.1109/TNNLS.2024.3395508

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{f4507751dac0408595b81b39a653d8ee,

title = "An Improved Prioritized DDPG Based on Fractional-Order Learning Scheme",

abstract = "Although deep deterministic policy gradient (DDPG) algorithm gets widespread attention as a result of its powerful functionality and applicability for large-scale continuous control, it cannot be denied that DDPG has problems such as low sample utilization efficiency and insufficient exploration. Therefore, an improved DDPG is presented to overcome these challenges in this article. Firstly, an optimizer based on fractional gradient is introduced into the algorithm network, which is conductive to increase the speed and accuracy of training convergence. On this basis, high-value experience replay based on weight-changed priority is proposed to improve sample utilization efficiency, and aiming to have a stronger exploration of the environment, an optimized exploration strategy for boundary action space is adopted. Finally, our proposed method is tested through the experiments of gym and pybullet platform. According to the results, our method speeds up the learning process, obtains higher average rewards in comparison with other algorithms.",

keywords = "Deep deterministic policy gradient (DDPG), fractional-order gradient, prioritized experience replay, reinforcement learning (RL)",

author = "Fan, {Quan Yong} and Meiying Cai and Bin Xu",

note = "Publisher Copyright: {\textcopyright} 2012 IEEE.",

year = "2025",

doi = "10.1109/TNNLS.2024.3395508",

language = "英语",

volume = "36",

pages = "6873--6882",

journal = "IEEE Transactions on Neural Networks and Learning Systems",

issn = "2162-237X",

publisher = "IEEE Computational Intelligence Society",

number = "4",

}

TY - JOUR

T1 - An Improved Prioritized DDPG Based on Fractional-Order Learning Scheme

AU - Fan, Quan Yong

AU - Cai, Meiying

AU - Xu, Bin

PY - 2025

Y1 - 2025

N2 - Although deep deterministic policy gradient (DDPG) algorithm gets widespread attention as a result of its powerful functionality and applicability for large-scale continuous control, it cannot be denied that DDPG has problems such as low sample utilization efficiency and insufficient exploration. Therefore, an improved DDPG is presented to overcome these challenges in this article. Firstly, an optimizer based on fractional gradient is introduced into the algorithm network, which is conductive to increase the speed and accuracy of training convergence. On this basis, high-value experience replay based on weight-changed priority is proposed to improve sample utilization efficiency, and aiming to have a stronger exploration of the environment, an optimized exploration strategy for boundary action space is adopted. Finally, our proposed method is tested through the experiments of gym and pybullet platform. According to the results, our method speeds up the learning process, obtains higher average rewards in comparison with other algorithms.

AB - Although deep deterministic policy gradient (DDPG) algorithm gets widespread attention as a result of its powerful functionality and applicability for large-scale continuous control, it cannot be denied that DDPG has problems such as low sample utilization efficiency and insufficient exploration. Therefore, an improved DDPG is presented to overcome these challenges in this article. Firstly, an optimizer based on fractional gradient is introduced into the algorithm network, which is conductive to increase the speed and accuracy of training convergence. On this basis, high-value experience replay based on weight-changed priority is proposed to improve sample utilization efficiency, and aiming to have a stronger exploration of the environment, an optimized exploration strategy for boundary action space is adopted. Finally, our proposed method is tested through the experiments of gym and pybullet platform. According to the results, our method speeds up the learning process, obtains higher average rewards in comparison with other algorithms.

KW - Deep deterministic policy gradient (DDPG)

KW - fractional-order gradient

KW - prioritized experience replay

KW - reinforcement learning (RL)

UR - http://www.scopus.com/inward/record.url?scp=105002298514&partnerID=8YFLogxK

U2 - 10.1109/TNNLS.2024.3395508

DO - 10.1109/TNNLS.2024.3395508

M3 - 文章

AN - SCOPUS:105002298514

SN - 2162-237X

VL - 36

SP - 6873

EP - 6882

JO - IEEE Transactions on Neural Networks and Learning Systems

JF - IEEE Transactions on Neural Networks and Learning Systems

IS - 4

ER -

An Improved Prioritized DDPG Based on Fractional-Order Learning Scheme

摘要

访问文件

其它文件与链接

指纹

引用此