异策略深度强化学习中的经验回放研究综述

Zi Jian Hu, Xiao Guang Gao, Kai Fang Wan, Le Tian Zhang, Qiang Long Wang, Evgeny Neretin

科研成果: 期刊稿件文章同行评审

4 引用 (Scopus)

摘要

As a machine learning method that does not need to obtain training data in advance, reinforcement learning (RL) is an important method to solve the sequential decision-making problem by finding the optimal strategy in the continuous interaction between the agent and the environment. Through the combination of deep learning (DL), deep reinforcement learning (DRL) has both powerful perception and decision-making capabilities, and is widely used in many fields to solve complex decision-making problems. Off-policy reinforcement learning separates exploration and utilization by storing and replaying interactive experience, making it easier to find the global optimal solution. How to make reasonable and efficient use of experience is the key to improve the efficiency of off-policy reinforcement learning methods. First, this paper introduces the basic theory of reinforcement learning. Then, the on-policy and off-policy reinforcement learning algorithms are briefly introduced. Next, two mainstream solutions of experience replay (ER) problem are introduced, including experience utilization and experience expansion. Finally, the relevant research work is summarized and prospected.

投稿的翻译标题Research on Experience Replay of Off-policy Deep Reinforcement Learning: A Review
源语言繁体中文
页(从-至)2237-2256
页数20
期刊Zidonghua Xuebao/Acta Automatica Sinica
49
11
DOI
出版状态已出版 - 11月 2023

关键词

  • artificial intelligence
  • Deep reinforcement learning (DRL)
  • experience replay (ER)
  • off-policy

指纹

探究 '异策略深度强化学习中的经验回放研究综述' 的科研主题。它们共同构成独一无二的指纹。

引用此