A sample aggregation approach to experiences replay of Dyna-Q learning

Haobin Shi; Shike Yang; Kao Shing Hwang; Jialin Chen; Mengkai Hu; Hengsheng Zhang

doi:10.1109/ACCESS.2018.2847048

A sample aggregation approach to experiences replay of Dyna-Q learning

Haobin Shi, Shike Yang, Kao Shing Hwang, Jialin Chen, Mengkai Hu, Hengsheng Zhang

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

7 引用（Scopus）

摘要

In a complex environment, the learning efficiency of reinforcement learning methods always decreases due to large-scale or continuous spaces problems, which can cause the well-known curse of dimensionality. To deal with this problem and enhance learning efficiency, this paper introduces an aggregation method by using framework of sample aggregation based on Chinese restaurant process (CRP), named FSA-CRP, to cluster experiential samples, which is represented by quadruples of the current state, action, next state, and the obtained reward. In addition, the proposed algorithm applies a similarity estimation method, the MinHash method, to calculate the similarity between samples. Moreover, to improve the learning efficiency, the experience sharing Dyna learning algorithm based on samples/clusters prediction method is proposed. While an agent learns the value function of the current state, it acquires clustering results, the value functions of the sample merge with the original as the updated value function of the cluster. In indirect learning (planning) for the Dyna-Q, a learning agent looks for the most likely branches of the constructed FSA-CRP model to raise up learning efficiency. The most likely branches will be selected by an improved action/sample selection algorithm. The algorithm applies the probability that the sample appears in the cluster to select simulated experiences for indirect learning. To verify the validity and applicability of the proposed method, experiments are conducted on a simulated maze and a cart-pole system. The results demonstrate that the proposed method can effectively accelerate the learning process.

源语言	英语
页（从-至）	37173-37184
页数	12
期刊	IEEE Access
卷	6
DOI	https://doi.org/10.1109/ACCESS.2018.2847048
出版状态	已出版 - 12 6月 2018

访问文件

10.1109/ACCESS.2018.2847048

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{6623c72c9d584696b4e93ca09b4ec733,

title = "A sample aggregation approach to experiences replay of Dyna-Q learning",

abstract = "In a complex environment, the learning efficiency of reinforcement learning methods always decreases due to large-scale or continuous spaces problems, which can cause the well-known curse of dimensionality. To deal with this problem and enhance learning efficiency, this paper introduces an aggregation method by using framework of sample aggregation based on Chinese restaurant process (CRP), named FSA-CRP, to cluster experiential samples, which is represented by quadruples of the current state, action, next state, and the obtained reward. In addition, the proposed algorithm applies a similarity estimation method, the MinHash method, to calculate the similarity between samples. Moreover, to improve the learning efficiency, the experience sharing Dyna learning algorithm based on samples/clusters prediction method is proposed. While an agent learns the value function of the current state, it acquires clustering results, the value functions of the sample merge with the original as the updated value function of the cluster. In indirect learning (planning) for the Dyna-Q, a learning agent looks for the most likely branches of the constructed FSA-CRP model to raise up learning efficiency. The most likely branches will be selected by an improved action/sample selection algorithm. The algorithm applies the probability that the sample appears in the cluster to select simulated experiences for indirect learning. To verify the validity and applicability of the proposed method, experiments are conducted on a simulated maze and a cart-pole system. The results demonstrate that the proposed method can effectively accelerate the learning process.",

keywords = "Chinese restaurant process, Dyna-Q, FSA-CRP model, Minhash, prediction",

author = "Haobin Shi and Shike Yang and Hwang, {Kao Shing} and Jialin Chen and Mengkai Hu and Hengsheng Zhang",

note = "Publisher Copyright: {\textcopyright} 2013 IEEE.",

year = "2018",

month = jun,

day = "12",

doi = "10.1109/ACCESS.2018.2847048",

language = "英语",

volume = "6",

pages = "37173--37184",

journal = "IEEE Access",

issn = "2169-3536",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - A sample aggregation approach to experiences replay of Dyna-Q learning

AU - Shi, Haobin

AU - Yang, Shike

AU - Hwang, Kao Shing

AU - Chen, Jialin

AU - Hu, Mengkai

AU - Zhang, Hengsheng

PY - 2018/6/12

Y1 - 2018/6/12

N2 - In a complex environment, the learning efficiency of reinforcement learning methods always decreases due to large-scale or continuous spaces problems, which can cause the well-known curse of dimensionality. To deal with this problem and enhance learning efficiency, this paper introduces an aggregation method by using framework of sample aggregation based on Chinese restaurant process (CRP), named FSA-CRP, to cluster experiential samples, which is represented by quadruples of the current state, action, next state, and the obtained reward. In addition, the proposed algorithm applies a similarity estimation method, the MinHash method, to calculate the similarity between samples. Moreover, to improve the learning efficiency, the experience sharing Dyna learning algorithm based on samples/clusters prediction method is proposed. While an agent learns the value function of the current state, it acquires clustering results, the value functions of the sample merge with the original as the updated value function of the cluster. In indirect learning (planning) for the Dyna-Q, a learning agent looks for the most likely branches of the constructed FSA-CRP model to raise up learning efficiency. The most likely branches will be selected by an improved action/sample selection algorithm. The algorithm applies the probability that the sample appears in the cluster to select simulated experiences for indirect learning. To verify the validity and applicability of the proposed method, experiments are conducted on a simulated maze and a cart-pole system. The results demonstrate that the proposed method can effectively accelerate the learning process.

AB - In a complex environment, the learning efficiency of reinforcement learning methods always decreases due to large-scale or continuous spaces problems, which can cause the well-known curse of dimensionality. To deal with this problem and enhance learning efficiency, this paper introduces an aggregation method by using framework of sample aggregation based on Chinese restaurant process (CRP), named FSA-CRP, to cluster experiential samples, which is represented by quadruples of the current state, action, next state, and the obtained reward. In addition, the proposed algorithm applies a similarity estimation method, the MinHash method, to calculate the similarity between samples. Moreover, to improve the learning efficiency, the experience sharing Dyna learning algorithm based on samples/clusters prediction method is proposed. While an agent learns the value function of the current state, it acquires clustering results, the value functions of the sample merge with the original as the updated value function of the cluster. In indirect learning (planning) for the Dyna-Q, a learning agent looks for the most likely branches of the constructed FSA-CRP model to raise up learning efficiency. The most likely branches will be selected by an improved action/sample selection algorithm. The algorithm applies the probability that the sample appears in the cluster to select simulated experiences for indirect learning. To verify the validity and applicability of the proposed method, experiments are conducted on a simulated maze and a cart-pole system. The results demonstrate that the proposed method can effectively accelerate the learning process.

KW - Chinese restaurant process

KW - Dyna-Q

KW - FSA-CRP model

KW - Minhash

KW - prediction

UR - http://www.scopus.com/inward/record.url?scp=85048563559&partnerID=8YFLogxK

U2 - 10.1109/ACCESS.2018.2847048

DO - 10.1109/ACCESS.2018.2847048

M3 - 文章

AN - SCOPUS:85048563559

SN - 2169-3536

VL - 6

SP - 37173

EP - 37184

JO - IEEE Access

JF - IEEE Access

ER -

A sample aggregation approach to experiences replay of Dyna-Q learning

摘要

访问文件

其它文件与链接

指纹

引用此