异策略深度强化学习中的经验回放研究综述

Zi Jian Hu; Xiao Guang Gao; Kai Fang Wan; Le Tian Zhang; Qiang Long Wang; Evgeny Neretin

doi:10.16383/j.aas.c220648

异策略深度强化学习中的经验回放研究综述

Zi Jian Hu, Xiao Guang Gao, Kai Fang Wan, Le Tian Zhang, Qiang Long Wang, Evgeny Neretin

电子信息学院

科研成果: 期刊稿件 › 文章 › 同行评审

4 引用（Scopus）

摘要

As a machine learning method that does not need to obtain training data in advance, reinforcement learning (RL) is an important method to solve the sequential decision-making problem by finding the optimal strategy in the continuous interaction between the agent and the environment. Through the combination of deep learning (DL), deep reinforcement learning (DRL) has both powerful perception and decision-making capabilities, and is widely used in many fields to solve complex decision-making problems. Off-policy reinforcement learning separates exploration and utilization by storing and replaying interactive experience, making it easier to find the global optimal solution. How to make reasonable and efficient use of experience is the key to improve the efficiency of off-policy reinforcement learning methods. First, this paper introduces the basic theory of reinforcement learning. Then, the on-policy and off-policy reinforcement learning algorithms are briefly introduced. Next, two mainstream solutions of experience replay (ER) problem are introduced, including experience utilization and experience expansion. Finally, the relevant research work is summarized and prospected.

投稿的翻译标题	Research on Experience Replay of Off-policy Deep Reinforcement Learning: A Review
源语言	繁体中文
页（从-至）	2237-2256
页数	20
期刊	Zidonghua Xuebao/Acta Automatica Sinica
卷	49
期	11
DOI	https://doi.org/10.16383/j.aas.c220648
出版状态	已出版 - 11月 2023

关键词

artificial intelligence
Deep reinforcement learning (DRL)
experience replay (ER)
off-policy

访问文件

10.16383/j.aas.c220648

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{0488d7fd595c439399010c3620398d61,

title = "异策略深度强化学习中的经验回放研究综述",

abstract = "As a machine learning method that does not need to obtain training data in advance, reinforcement learning (RL) is an important method to solve the sequential decision-making problem by finding the optimal strategy in the continuous interaction between the agent and the environment. Through the combination of deep learning (DL), deep reinforcement learning (DRL) has both powerful perception and decision-making capabilities, and is widely used in many fields to solve complex decision-making problems. Off-policy reinforcement learning separates exploration and utilization by storing and replaying interactive experience, making it easier to find the global optimal solution. How to make reasonable and efficient use of experience is the key to improve the efficiency of off-policy reinforcement learning methods. First, this paper introduces the basic theory of reinforcement learning. Then, the on-policy and off-policy reinforcement learning algorithms are briefly introduced. Next, two mainstream solutions of experience replay (ER) problem are introduced, including experience utilization and experience expansion. Finally, the relevant research work is summarized and prospected.",

keywords = "artificial intelligence, Deep reinforcement learning (DRL), experience replay (ER), off-policy",

author = "Hu, {Zi Jian} and Gao, {Xiao Guang} and Wan, {Kai Fang} and Zhang, {Le Tian} and Wang, {Qiang Long} and Evgeny Neretin",

year = "2023",

month = nov,

doi = "10.16383/j.aas.c220648",

language = "繁体中文",

volume = "49",

pages = "2237--2256",

journal = "Zidonghua Xuebao/Acta Automatica Sinica",

issn = "0254-4156",

publisher = "Science Press ",

number = "11",

}

TY - JOUR

T1 - 异策略深度强化学习中的经验回放研究综述

AU - Hu, Zi Jian

AU - Gao, Xiao Guang

AU - Wan, Kai Fang

AU - Zhang, Le Tian

AU - Wang, Qiang Long

AU - Neretin, Evgeny

PY - 2023/11

Y1 - 2023/11

N2 - As a machine learning method that does not need to obtain training data in advance, reinforcement learning (RL) is an important method to solve the sequential decision-making problem by finding the optimal strategy in the continuous interaction between the agent and the environment. Through the combination of deep learning (DL), deep reinforcement learning (DRL) has both powerful perception and decision-making capabilities, and is widely used in many fields to solve complex decision-making problems. Off-policy reinforcement learning separates exploration and utilization by storing and replaying interactive experience, making it easier to find the global optimal solution. How to make reasonable and efficient use of experience is the key to improve the efficiency of off-policy reinforcement learning methods. First, this paper introduces the basic theory of reinforcement learning. Then, the on-policy and off-policy reinforcement learning algorithms are briefly introduced. Next, two mainstream solutions of experience replay (ER) problem are introduced, including experience utilization and experience expansion. Finally, the relevant research work is summarized and prospected.

AB - As a machine learning method that does not need to obtain training data in advance, reinforcement learning (RL) is an important method to solve the sequential decision-making problem by finding the optimal strategy in the continuous interaction between the agent and the environment. Through the combination of deep learning (DL), deep reinforcement learning (DRL) has both powerful perception and decision-making capabilities, and is widely used in many fields to solve complex decision-making problems. Off-policy reinforcement learning separates exploration and utilization by storing and replaying interactive experience, making it easier to find the global optimal solution. How to make reasonable and efficient use of experience is the key to improve the efficiency of off-policy reinforcement learning methods. First, this paper introduces the basic theory of reinforcement learning. Then, the on-policy and off-policy reinforcement learning algorithms are briefly introduced. Next, two mainstream solutions of experience replay (ER) problem are introduced, including experience utilization and experience expansion. Finally, the relevant research work is summarized and prospected.

KW - artificial intelligence

KW - Deep reinforcement learning (DRL)

KW - experience replay (ER)

KW - off-policy

UR - http://www.scopus.com/inward/record.url?scp=85179757372&partnerID=8YFLogxK

U2 - 10.16383/j.aas.c220648

DO - 10.16383/j.aas.c220648

M3 - 文章

AN - SCOPUS:85179757372

SN - 0254-4156

VL - 49

SP - 2237

EP - 2256

JO - Zidonghua Xuebao/Acta Automatica Sinica

JF - Zidonghua Xuebao/Acta Automatica Sinica

IS - 11

ER -

异策略深度强化学习中的经验回放研究综述

摘要

关键词

访问文件

其它文件与链接

指纹

引用此