Smooth Clip Advantage PPO in Reinforcement Learning

Junwei Wang, Zilin Zeng, Peng Shang

科研成果: 期刊稿件会议文章同行评审

2 引用 (Scopus)

摘要

Deep reinforcement learning outperforms traditional methods in some domains. In this paper, we propose a novel reinforcement learning on policy (RL) algorithm, the Smoothing Clip Advantage Proximal Policy optimization Algorithm (SCAPPO), which extends the classical PPO algorithm where we exploit the smoothing properties of the sigmoid function to make full use of useful gradients. In addition, we provide more efficient gradients for policy networks effective gradients, aiming to solve the overfitting problem caused by the coupling of strategy and value functions. SCAPPO outperforms currently popular reinforcement learning algorithms in performance tasks in the Open AI Gym.

源语言英语
文章编号012005
期刊Journal of Physics: Conference Series
2513
1
DOI
出版状态已出版 - 2023
已对外发布
活动2023 7th International Conference on Artificial Intelligence, Automation and Control Technologies, AIACT 2023 - Virtual, Online, 中国
期限: 24 2月 202326 2月 2023

指纹

探究 'Smooth Clip Advantage PPO in Reinforcement Learning' 的科研主题。它们共同构成独一无二的指纹。

引用此