Advantage Policy Update Based on Proximal Policy Optimization

Zilin Zeng, Junwei Wang, Zhigang Hu, Dongnan Su, Peng Shang

科研成果: 书/报告/会议事项章节会议稿件同行评审

1 引用 (Scopus)

摘要

In this paper, a novel policy network update approach based on Proximal Policy Optimization (PPO), Advantageous Update Policy Proximal Policy Optimization (AUP-PPO), is proposed to alleviate the problem of over-fitting caused by the use of shared layers for policy and value functions. Extended from the previous sample-efficient reinforcement learning method PPO that uses separate networks to learn policy and value functions to make them decouple optimization, AUP-PPO uses the value function to calculate the advantage and updates the policy with the loss between the current and target advantage function as a penalty term instead of the value function. Evaluated by multiple benchmark control tasks in Open-AI gym, AUP-PPO exhibits better generalization to the environment and achieves faster convergence and better robustness compared with the original PPO.

源语言英语
主期刊名Third International Seminar on Artificial Intelligence, Networking, and Information Technology, AINIT 2022
编辑Naijing Hu, Guanglin Zhang
出版商SPIE
ISBN(电子版)9781510662964
DOI
出版状态已出版 - 2023
已对外发布
活动3rd International Seminar on Artificial Intelligence, Networking, and Information Technology, AINIT 2022 - Shanghai, 中国
期限: 23 9月 202225 9月 2022

出版系列

姓名Proceedings of SPIE - The International Society for Optical Engineering
12587
ISSN(印刷版)0277-786X
ISSN(电子版)1996-756X

会议

会议3rd International Seminar on Artificial Intelligence, Networking, and Information Technology, AINIT 2022
国家/地区中国
Shanghai
时期23/09/2225/09/22

指纹

探究 'Advantage Policy Update Based on Proximal Policy Optimization' 的科研主题。它们共同构成独一无二的指纹。

引用此