Smooth Clip Advantage PPO in Reinforcement Learning

Junwei Wang; Zilin Zeng; Peng Shang

doi:10.1088/1742-6596/2513/1/012005

Smooth Clip Advantage PPO in Reinforcement Learning

Junwei Wang, Zilin Zeng, Peng Shang

Shenzhen Institute of Advanced Technology

科研成果: 期刊稿件 › 会议文章 › 同行评审

2 引用（Scopus）

摘要

Deep reinforcement learning outperforms traditional methods in some domains. In this paper, we propose a novel reinforcement learning on policy (RL) algorithm, the Smoothing Clip Advantage Proximal Policy optimization Algorithm (SCAPPO), which extends the classical PPO algorithm where we exploit the smoothing properties of the sigmoid function to make full use of useful gradients. In addition, we provide more efficient gradients for policy networks effective gradients, aiming to solve the overfitting problem caused by the coupling of strategy and value functions. SCAPPO outperforms currently popular reinforcement learning algorithms in performance tasks in the Open AI Gym.

源语言	英语
文章编号	012005
期刊	Journal of Physics: Conference Series
卷	2513
期	1
DOI	https://doi.org/10.1088/1742-6596/2513/1/012005
出版状态	已出版 - 2023
已对外发布	是
活动	2023 7th International Conference on Artificial Intelligence, Automation and Control Technologies, AIACT 2023 - Virtual, Online, 中国期限: 24 2月 2023 → 26 2月 2023

访问文件

10.1088/1742-6596/2513/1/012005

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{7ccec176c9444aa2aa43ed28a2368a5f,

title = "Smooth Clip Advantage PPO in Reinforcement Learning",

abstract = "Deep reinforcement learning outperforms traditional methods in some domains. In this paper, we propose a novel reinforcement learning on policy (RL) algorithm, the Smoothing Clip Advantage Proximal Policy optimization Algorithm (SCAPPO), which extends the classical PPO algorithm where we exploit the smoothing properties of the sigmoid function to make full use of useful gradients. In addition, we provide more efficient gradients for policy networks effective gradients, aiming to solve the overfitting problem caused by the coupling of strategy and value functions. SCAPPO outperforms currently popular reinforcement learning algorithms in performance tasks in the Open AI Gym.",

author = "Junwei Wang and Zilin Zeng and Peng Shang",

note = "Publisher Copyright: {\textcopyright} Published under licence by IOP Publishing Ltd.; 2023 7th International Conference on Artificial Intelligence, Automation and Control Technologies, AIACT 2023 ; Conference date: 24-02-2023 Through 26-02-2023",

year = "2023",

doi = "10.1088/1742-6596/2513/1/012005",

language = "英语",

volume = "2513",

journal = "Journal of Physics: Conference Series",

issn = "1742-6588",

publisher = "IOP Publishing Ltd.",

number = "1",

}

TY - JOUR

T1 - Smooth Clip Advantage PPO in Reinforcement Learning

AU - Wang, Junwei

AU - Zeng, Zilin

AU - Shang, Peng

N1 - Publisher Copyright: © Published under licence by IOP Publishing Ltd.

PY - 2023

Y1 - 2023

N2 - Deep reinforcement learning outperforms traditional methods in some domains. In this paper, we propose a novel reinforcement learning on policy (RL) algorithm, the Smoothing Clip Advantage Proximal Policy optimization Algorithm (SCAPPO), which extends the classical PPO algorithm where we exploit the smoothing properties of the sigmoid function to make full use of useful gradients. In addition, we provide more efficient gradients for policy networks effective gradients, aiming to solve the overfitting problem caused by the coupling of strategy and value functions. SCAPPO outperforms currently popular reinforcement learning algorithms in performance tasks in the Open AI Gym.

AB - Deep reinforcement learning outperforms traditional methods in some domains. In this paper, we propose a novel reinforcement learning on policy (RL) algorithm, the Smoothing Clip Advantage Proximal Policy optimization Algorithm (SCAPPO), which extends the classical PPO algorithm where we exploit the smoothing properties of the sigmoid function to make full use of useful gradients. In addition, we provide more efficient gradients for policy networks effective gradients, aiming to solve the overfitting problem caused by the coupling of strategy and value functions. SCAPPO outperforms currently popular reinforcement learning algorithms in performance tasks in the Open AI Gym.

UR - http://www.scopus.com/inward/record.url?scp=85166672354&partnerID=8YFLogxK

U2 - 10.1088/1742-6596/2513/1/012005

DO - 10.1088/1742-6596/2513/1/012005

M3 - 会议文章

AN - SCOPUS:85166672354

SN - 1742-6588

VL - 2513

JO - Journal of Physics: Conference Series

JF - Journal of Physics: Conference Series

IS - 1

M1 - 012005

T2 - 2023 7th International Conference on Artificial Intelligence, Automation and Control Technologies, AIACT 2023

Y2 - 24 February 2023 through 26 February 2023

ER -

Smooth Clip Advantage PPO in Reinforcement Learning

摘要

访问文件

其它文件与链接

指纹

引用此