Abstract
As a machine learning method that does not need to obtain training data in advance, reinforcement learning (RL) is an important method to solve the sequential decision-making problem by finding the optimal strategy in the continuous interaction between the agent and the environment. Through the combination of deep learning (DL), deep reinforcement learning (DRL) has both powerful perception and decision-making capabilities, and is widely used in many fields to solve complex decision-making problems. Off-policy reinforcement learning separates exploration and utilization by storing and replaying interactive experience, making it easier to find the global optimal solution. How to make reasonable and efficient use of experience is the key to improve the efficiency of off-policy reinforcement learning methods. First, this paper introduces the basic theory of reinforcement learning. Then, the on-policy and off-policy reinforcement learning algorithms are briefly introduced. Next, two mainstream solutions of experience replay (ER) problem are introduced, including experience utilization and experience expansion. Finally, the relevant research work is summarized and prospected.
Translated title of the contribution | Research on Experience Replay of Off-policy Deep Reinforcement Learning: A Review |
---|---|
Original language | Chinese (Traditional) |
Pages (from-to) | 2237-2256 |
Number of pages | 20 |
Journal | Zidonghua Xuebao/Acta Automatica Sinica |
Volume | 49 |
Issue number | 11 |
DOIs | |
State | Published - Nov 2023 |