Abstract
Deep reinforcement learning outperforms traditional methods in some domains. In this paper, we propose a novel reinforcement learning on policy (RL) algorithm, the Smoothing Clip Advantage Proximal Policy optimization Algorithm (SCAPPO), which extends the classical PPO algorithm where we exploit the smoothing properties of the sigmoid function to make full use of useful gradients. In addition, we provide more efficient gradients for policy networks effective gradients, aiming to solve the overfitting problem caused by the coupling of strategy and value functions. SCAPPO outperforms currently popular reinforcement learning algorithms in performance tasks in the Open AI Gym.
Original language | English |
---|---|
Article number | 012005 |
Journal | Journal of Physics: Conference Series |
Volume | 2513 |
Issue number | 1 |
DOIs | |
State | Published - 2023 |
Externally published | Yes |
Event | 2023 7th International Conference on Artificial Intelligence, Automation and Control Technologies, AIACT 2023 - Virtual, Online, China Duration: 24 Feb 2023 → 26 Feb 2023 |