Smooth Clip Advantage PPO in Reinforcement Learning

Junwei Wang, Zilin Zeng, Peng Shang

Research output: Contribution to journalConference articlepeer-review

2 Scopus citations

Abstract

Deep reinforcement learning outperforms traditional methods in some domains. In this paper, we propose a novel reinforcement learning on policy (RL) algorithm, the Smoothing Clip Advantage Proximal Policy optimization Algorithm (SCAPPO), which extends the classical PPO algorithm where we exploit the smoothing properties of the sigmoid function to make full use of useful gradients. In addition, we provide more efficient gradients for policy networks effective gradients, aiming to solve the overfitting problem caused by the coupling of strategy and value functions. SCAPPO outperforms currently popular reinforcement learning algorithms in performance tasks in the Open AI Gym.

Original languageEnglish
Article number012005
JournalJournal of Physics: Conference Series
Volume2513
Issue number1
DOIs
StatePublished - 2023
Externally publishedYes
Event2023 7th International Conference on Artificial Intelligence, Automation and Control Technologies, AIACT 2023 - Virtual, Online, China
Duration: 24 Feb 202326 Feb 2023

Fingerprint

Dive into the research topics of 'Smooth Clip Advantage PPO in Reinforcement Learning'. Together they form a unique fingerprint.

Cite this