TY - JOUR
T1 - A novel policy based on action confidence limit to improve exploration efficiency in reinforcement learning
AU - Huang, Fanghui
AU - Deng, Xinyang
AU - He, Yixin
AU - Jiang, Wen
N1 - Publisher Copyright:
© 2023 Elsevier Inc.
PY - 2023/9
Y1 - 2023/9
N2 - Reinforcement learning has been used to solve many intelligent decision-making problems. However, reinforcement learning still faces a challenge of the low exploration efficiency problem in practice, limiting its widespread application. To address this issue, in this paper, a novel exploration policy based on Q value and exploration value is proposed. The exploration value adopts action confidence limit to measure the uncertainty of the action, which guides the agent to adaptively explore the uncertainty region of the environment. This method can improve exploration efficiency and is beneficial for the agent to make optimal decisions. Then, in order to make our proposed policy applicable to discrete and continuous environments, we combine the proposed policy with classic reinforcement learning algorithms (Q-learning and deep Q-network), and propose two novel algorithms, respectively. Moreover, the convergence of the algorithm is analyzed. Furthermore, a deep auto-encoder network model is used to establish the mapping relationship on state-action in discrete environments, which can avoid a large number of state-action pairs stored in Q-learning stage. Our proposed method can achieve adaptive and effective exploration, which is beneficial for the agent to make intelligent decisions. Finally, the results are verified in discrete and continuous simulation environments. Experimental results demonstrate that our method improves the average reward value and reduces the number of catastrophic actions.
AB - Reinforcement learning has been used to solve many intelligent decision-making problems. However, reinforcement learning still faces a challenge of the low exploration efficiency problem in practice, limiting its widespread application. To address this issue, in this paper, a novel exploration policy based on Q value and exploration value is proposed. The exploration value adopts action confidence limit to measure the uncertainty of the action, which guides the agent to adaptively explore the uncertainty region of the environment. This method can improve exploration efficiency and is beneficial for the agent to make optimal decisions. Then, in order to make our proposed policy applicable to discrete and continuous environments, we combine the proposed policy with classic reinforcement learning algorithms (Q-learning and deep Q-network), and propose two novel algorithms, respectively. Moreover, the convergence of the algorithm is analyzed. Furthermore, a deep auto-encoder network model is used to establish the mapping relationship on state-action in discrete environments, which can avoid a large number of state-action pairs stored in Q-learning stage. Our proposed method can achieve adaptive and effective exploration, which is beneficial for the agent to make intelligent decisions. Finally, the results are verified in discrete and continuous simulation environments. Experimental results demonstrate that our method improves the average reward value and reduces the number of catastrophic actions.
KW - Action confidence limit
KW - Deep auto-encoder network
KW - Exploration policy
KW - Reinforcement learning
KW - Uncertainty of action
UR - http://www.scopus.com/inward/record.url?scp=85156101305&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2023.119011
DO - 10.1016/j.ins.2023.119011
M3 - 文章
AN - SCOPUS:85156101305
SN - 0020-0255
VL - 640
JO - Information Sciences
JF - Information Sciences
M1 - 119011
ER -