Controlling underestimation bias in reinforcement learning via minmax operation

Fanghui HUANG; Yixin HE; Yu ZHANG; Xinyang DENG; Wen JIANG

doi:10.1016/j.cja.2024.03.008

Controlling underestimation bias in reinforcement learning via minmax operation

Fanghui HUANG, Yixin HE, Yu ZHANG, Xinyang DENG, Wen JIANG

电子信息学院

科研成果: 期刊稿件 › 文章 › 同行评审

2 引用（Scopus）

摘要

Obtaining the accurate value estimation and reducing the estimation bias are the key issues in reinforcement learning. However, current methods that address the overestimation problem tend to introduce underestimation, which face a challenge of precise decision-making in many fields. To address this issue, we conduct a theoretical analysis of the underestimation bias and propose the minmax operation, which allow for flexible control of the estimation bias. Specifically, we select the maximum value of each action from multiple parallel state-action networks to create a new state-action value sequence. Then, a minimum value is selected to obtain more accurate value estimations. Moreover, based on the minmax operation, we propose two novel algorithms by combining Deep Q-Network (DQN) and Double DQN (DDQN), named minmax-DQN and minmax-DDQN. Meanwhile, we conduct theoretical analyses of the estimation bias and variance caused by our proposed minmax operation, which show that this operation significantly improves both underestimation and overestimation biases and leads to the unbiased estimation. Furthermore, the variance is also reduced, which is helpful to improve the network training stability. Finally, we conduct numerous comparative experiments in various environments, which empirically demonstrate the superiority of our method.

源语言	英语
页（从-至）	406-417
页数	12
期刊	Chinese Journal of Aeronautics
卷	37
期	7
DOI	https://doi.org/10.1016/j.cja.2024.03.008
出版状态	已出版 - 7月 2024

访问文件

10.1016/j.cja.2024.03.008

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{079c807c119e48a396bb7f890f7920f5,

title = "Controlling underestimation bias in reinforcement learning via minmax operation",

abstract = "Obtaining the accurate value estimation and reducing the estimation bias are the key issues in reinforcement learning. However, current methods that address the overestimation problem tend to introduce underestimation, which face a challenge of precise decision-making in many fields. To address this issue, we conduct a theoretical analysis of the underestimation bias and propose the minmax operation, which allow for flexible control of the estimation bias. Specifically, we select the maximum value of each action from multiple parallel state-action networks to create a new state-action value sequence. Then, a minimum value is selected to obtain more accurate value estimations. Moreover, based on the minmax operation, we propose two novel algorithms by combining Deep Q-Network (DQN) and Double DQN (DDQN), named minmax-DQN and minmax-DDQN. Meanwhile, we conduct theoretical analyses of the estimation bias and variance caused by our proposed minmax operation, which show that this operation significantly improves both underestimation and overestimation biases and leads to the unbiased estimation. Furthermore, the variance is also reduced, which is helpful to improve the network training stability. Finally, we conduct numerous comparative experiments in various environments, which empirically demonstrate the superiority of our method.",

keywords = "Estimation bias, Minmax operation, Reinforcement learning, Underestimation bias, Variance",

author = "Fanghui HUANG and Yixin HE and Yu ZHANG and Xinyang DENG and Wen JIANG",

note = "Publisher Copyright: {\textcopyright} 2024",

year = "2024",

month = jul,

doi = "10.1016/j.cja.2024.03.008",

language = "英语",

volume = "37",

pages = "406--417",

journal = "Chinese Journal of Aeronautics",

issn = "1000-9361",

publisher = "Elsevier B.V.",

number = "7",

}

TY - JOUR

T1 - Controlling underestimation bias in reinforcement learning via minmax operation

AU - HUANG, Fanghui

AU - HE, Yixin

AU - ZHANG, Yu

AU - DENG, Xinyang

AU - JIANG, Wen

PY - 2024/7

Y1 - 2024/7

N2 - Obtaining the accurate value estimation and reducing the estimation bias are the key issues in reinforcement learning. However, current methods that address the overestimation problem tend to introduce underestimation, which face a challenge of precise decision-making in many fields. To address this issue, we conduct a theoretical analysis of the underestimation bias and propose the minmax operation, which allow for flexible control of the estimation bias. Specifically, we select the maximum value of each action from multiple parallel state-action networks to create a new state-action value sequence. Then, a minimum value is selected to obtain more accurate value estimations. Moreover, based on the minmax operation, we propose two novel algorithms by combining Deep Q-Network (DQN) and Double DQN (DDQN), named minmax-DQN and minmax-DDQN. Meanwhile, we conduct theoretical analyses of the estimation bias and variance caused by our proposed minmax operation, which show that this operation significantly improves both underestimation and overestimation biases and leads to the unbiased estimation. Furthermore, the variance is also reduced, which is helpful to improve the network training stability. Finally, we conduct numerous comparative experiments in various environments, which empirically demonstrate the superiority of our method.

AB - Obtaining the accurate value estimation and reducing the estimation bias are the key issues in reinforcement learning. However, current methods that address the overestimation problem tend to introduce underestimation, which face a challenge of precise decision-making in many fields. To address this issue, we conduct a theoretical analysis of the underestimation bias and propose the minmax operation, which allow for flexible control of the estimation bias. Specifically, we select the maximum value of each action from multiple parallel state-action networks to create a new state-action value sequence. Then, a minimum value is selected to obtain more accurate value estimations. Moreover, based on the minmax operation, we propose two novel algorithms by combining Deep Q-Network (DQN) and Double DQN (DDQN), named minmax-DQN and minmax-DDQN. Meanwhile, we conduct theoretical analyses of the estimation bias and variance caused by our proposed minmax operation, which show that this operation significantly improves both underestimation and overestimation biases and leads to the unbiased estimation. Furthermore, the variance is also reduced, which is helpful to improve the network training stability. Finally, we conduct numerous comparative experiments in various environments, which empirically demonstrate the superiority of our method.

KW - Estimation bias

KW - Minmax operation

KW - Reinforcement learning

KW - Underestimation bias

KW - Variance

UR - http://www.scopus.com/inward/record.url?scp=85191605712&partnerID=8YFLogxK

U2 - 10.1016/j.cja.2024.03.008

DO - 10.1016/j.cja.2024.03.008

M3 - 文章

AN - SCOPUS:85191605712

SN - 1000-9361

VL - 37

SP - 406

EP - 417

JO - Chinese Journal of Aeronautics

JF - Chinese Journal of Aeronautics

IS - 7

ER -

Controlling underestimation bias in reinforcement learning via minmax operation

摘要

访问文件

其它文件与链接

指纹

引用此