TY - JOUR
T1 - An Improved Prioritized DDPG Based on Fractional-Order Learning Scheme
AU - Fan, Quan Yong
AU - Cai, Meiying
AU - Xu, Bin
N1 - Publisher Copyright:
© 2012 IEEE.
PY - 2025
Y1 - 2025
N2 - Although deep deterministic policy gradient (DDPG) algorithm gets widespread attention as a result of its powerful functionality and applicability for large-scale continuous control, it cannot be denied that DDPG has problems such as low sample utilization efficiency and insufficient exploration. Therefore, an improved DDPG is presented to overcome these challenges in this article. Firstly, an optimizer based on fractional gradient is introduced into the algorithm network, which is conductive to increase the speed and accuracy of training convergence. On this basis, high-value experience replay based on weight-changed priority is proposed to improve sample utilization efficiency, and aiming to have a stronger exploration of the environment, an optimized exploration strategy for boundary action space is adopted. Finally, our proposed method is tested through the experiments of gym and pybullet platform. According to the results, our method speeds up the learning process, obtains higher average rewards in comparison with other algorithms.
AB - Although deep deterministic policy gradient (DDPG) algorithm gets widespread attention as a result of its powerful functionality and applicability for large-scale continuous control, it cannot be denied that DDPG has problems such as low sample utilization efficiency and insufficient exploration. Therefore, an improved DDPG is presented to overcome these challenges in this article. Firstly, an optimizer based on fractional gradient is introduced into the algorithm network, which is conductive to increase the speed and accuracy of training convergence. On this basis, high-value experience replay based on weight-changed priority is proposed to improve sample utilization efficiency, and aiming to have a stronger exploration of the environment, an optimized exploration strategy for boundary action space is adopted. Finally, our proposed method is tested through the experiments of gym and pybullet platform. According to the results, our method speeds up the learning process, obtains higher average rewards in comparison with other algorithms.
KW - Deep deterministic policy gradient (DDPG)
KW - fractional-order gradient
KW - prioritized experience replay
KW - reinforcement learning (RL)
UR - http://www.scopus.com/inward/record.url?scp=105002298514&partnerID=8YFLogxK
U2 - 10.1109/TNNLS.2024.3395508
DO - 10.1109/TNNLS.2024.3395508
M3 - 文章
AN - SCOPUS:105002298514
SN - 2162-237X
VL - 36
SP - 6873
EP - 6882
JO - IEEE Transactions on Neural Networks and Learning Systems
JF - IEEE Transactions on Neural Networks and Learning Systems
IS - 4
ER -