A novel policy based on action confidence limit to improve exploration efficiency in reinforcement learning

Fanghui Huang; Xinyang Deng; Yixin He; Wen Jiang

doi:10.1016/j.ins.2023.119011

A novel policy based on action confidence limit to improve exploration efficiency in reinforcement learning

Fanghui Huang, Xinyang Deng, Yixin He, Wen Jiang

电子信息学院

科研成果: 期刊稿件 › 文章 › 同行评审

13 引用（Scopus）

摘要

Reinforcement learning has been used to solve many intelligent decision-making problems. However, reinforcement learning still faces a challenge of the low exploration efficiency problem in practice, limiting its widespread application. To address this issue, in this paper, a novel exploration policy based on Q value and exploration value is proposed. The exploration value adopts action confidence limit to measure the uncertainty of the action, which guides the agent to adaptively explore the uncertainty region of the environment. This method can improve exploration efficiency and is beneficial for the agent to make optimal decisions. Then, in order to make our proposed policy applicable to discrete and continuous environments, we combine the proposed policy with classic reinforcement learning algorithms (Q-learning and deep Q-network), and propose two novel algorithms, respectively. Moreover, the convergence of the algorithm is analyzed. Furthermore, a deep auto-encoder network model is used to establish the mapping relationship on state-action in discrete environments, which can avoid a large number of state-action pairs stored in Q-learning stage. Our proposed method can achieve adaptive and effective exploration, which is beneficial for the agent to make intelligent decisions. Finally, the results are verified in discrete and continuous simulation environments. Experimental results demonstrate that our method improves the average reward value and reduces the number of catastrophic actions.

源语言	英语
文章编号	119011
期刊	Information Sciences
卷	640
DOI	https://doi.org/10.1016/j.ins.2023.119011
出版状态	已出版 - 9月 2023

访问文件

10.1016/j.ins.2023.119011

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{66c4afb3389940c4b62e25715669ceb9,

title = "A novel policy based on action confidence limit to improve exploration efficiency in reinforcement learning",

abstract = "Reinforcement learning has been used to solve many intelligent decision-making problems. However, reinforcement learning still faces a challenge of the low exploration efficiency problem in practice, limiting its widespread application. To address this issue, in this paper, a novel exploration policy based on Q value and exploration value is proposed. The exploration value adopts action confidence limit to measure the uncertainty of the action, which guides the agent to adaptively explore the uncertainty region of the environment. This method can improve exploration efficiency and is beneficial for the agent to make optimal decisions. Then, in order to make our proposed policy applicable to discrete and continuous environments, we combine the proposed policy with classic reinforcement learning algorithms (Q-learning and deep Q-network), and propose two novel algorithms, respectively. Moreover, the convergence of the algorithm is analyzed. Furthermore, a deep auto-encoder network model is used to establish the mapping relationship on state-action in discrete environments, which can avoid a large number of state-action pairs stored in Q-learning stage. Our proposed method can achieve adaptive and effective exploration, which is beneficial for the agent to make intelligent decisions. Finally, the results are verified in discrete and continuous simulation environments. Experimental results demonstrate that our method improves the average reward value and reduces the number of catastrophic actions.",

keywords = "Action confidence limit, Deep auto-encoder network, Exploration policy, Reinforcement learning, Uncertainty of action",

author = "Fanghui Huang and Xinyang Deng and Yixin He and Wen Jiang",

note = "Publisher Copyright: {\textcopyright} 2023 Elsevier Inc.",

year = "2023",

month = sep,

doi = "10.1016/j.ins.2023.119011",

language = "英语",

volume = "640",

journal = "Information Sciences",

issn = "0020-0255",

publisher = "Elsevier Inc.",

}

TY - JOUR

T1 - A novel policy based on action confidence limit to improve exploration efficiency in reinforcement learning

AU - Huang, Fanghui

AU - Deng, Xinyang

AU - He, Yixin

AU - Jiang, Wen

PY - 2023/9

Y1 - 2023/9

N2 - Reinforcement learning has been used to solve many intelligent decision-making problems. However, reinforcement learning still faces a challenge of the low exploration efficiency problem in practice, limiting its widespread application. To address this issue, in this paper, a novel exploration policy based on Q value and exploration value is proposed. The exploration value adopts action confidence limit to measure the uncertainty of the action, which guides the agent to adaptively explore the uncertainty region of the environment. This method can improve exploration efficiency and is beneficial for the agent to make optimal decisions. Then, in order to make our proposed policy applicable to discrete and continuous environments, we combine the proposed policy with classic reinforcement learning algorithms (Q-learning and deep Q-network), and propose two novel algorithms, respectively. Moreover, the convergence of the algorithm is analyzed. Furthermore, a deep auto-encoder network model is used to establish the mapping relationship on state-action in discrete environments, which can avoid a large number of state-action pairs stored in Q-learning stage. Our proposed method can achieve adaptive and effective exploration, which is beneficial for the agent to make intelligent decisions. Finally, the results are verified in discrete and continuous simulation environments. Experimental results demonstrate that our method improves the average reward value and reduces the number of catastrophic actions.

AB - Reinforcement learning has been used to solve many intelligent decision-making problems. However, reinforcement learning still faces a challenge of the low exploration efficiency problem in practice, limiting its widespread application. To address this issue, in this paper, a novel exploration policy based on Q value and exploration value is proposed. The exploration value adopts action confidence limit to measure the uncertainty of the action, which guides the agent to adaptively explore the uncertainty region of the environment. This method can improve exploration efficiency and is beneficial for the agent to make optimal decisions. Then, in order to make our proposed policy applicable to discrete and continuous environments, we combine the proposed policy with classic reinforcement learning algorithms (Q-learning and deep Q-network), and propose two novel algorithms, respectively. Moreover, the convergence of the algorithm is analyzed. Furthermore, a deep auto-encoder network model is used to establish the mapping relationship on state-action in discrete environments, which can avoid a large number of state-action pairs stored in Q-learning stage. Our proposed method can achieve adaptive and effective exploration, which is beneficial for the agent to make intelligent decisions. Finally, the results are verified in discrete and continuous simulation environments. Experimental results demonstrate that our method improves the average reward value and reduces the number of catastrophic actions.

KW - Action confidence limit

KW - Deep auto-encoder network

KW - Exploration policy

KW - Reinforcement learning

KW - Uncertainty of action

UR - http://www.scopus.com/inward/record.url?scp=85156101305&partnerID=8YFLogxK

U2 - 10.1016/j.ins.2023.119011

DO - 10.1016/j.ins.2023.119011

M3 - 文章

AN - SCOPUS:85156101305

SN - 0020-0255

VL - 640

JO - Information Sciences

JF - Information Sciences

M1 - 119011

ER -

A novel policy based on action confidence limit to improve exploration efficiency in reinforcement learning

摘要

访问文件

其它文件与链接

指纹

引用此