Model-Based Offline Adaptive Policy Optimization with Episodic Memory

Hongye Cao; Qianru Wei; Jiangbin Zheng; Yanqing Shi

doi:10.1007/978-3-031-15931-2_5

Model-Based Offline Adaptive Policy Optimization with Episodic Memory

Hongye Cao, Qianru Wei, Jiangbin Zheng, Yanqing Shi

软件学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

2 引用（Scopus）

摘要

Offline reinforcement learning (RL) is a promising direction to apply RL to real-world by avoiding online expensive and dangerous exploration. However, offline RL is challenging due to extrapolation errors caused by the distribution shift between offline datasets and states visited by behavior policy. Existing model-based offline RL methods set pessimistic constraints of the learned model within the support region of the offline data to avoid extrapolation errors, but these approaches limit the generalization potential of the policy in out-of-distribution (OOD) region. The artificial fixed uncertainty calculation and the sparse reward problem of low-quality datasets in existing methods have weak adaptability to different learning tasks. Hence, a model-based offline adaptive policy optimization with episodic memory is proposed in this work to improve generalization of the policy. Inspired by active learning, constraint strength is proposed to trade off the return and risk adaptively to balance the robustness and generalization ability of the policy. Further, episodic memory is applied to capture successful experience to improve adaptability. Extensive experiments on D4RL datasets demonstrate that the proposed method outperforms existing state-of-the-art methods and achieves superior performance on challenging tasks requiring OOD generalization.

源语言	英语
主期刊名	Artificial Neural Networks and Machine Learning - ICANN 2022 - 31st International Conference on Artificial Neural Networks, Proceedings
编辑	Elias Pimenidis, Mehmet Aydin, Plamen Angelov, Chrisina Jayne, Antonios Papaleonidas
出版商	Springer Science and Business Media Deutschland GmbH
页	50-62
页数	13
ISBN（印刷版）	9783031159305
DOI	https://doi.org/10.1007/978-3-031-15931-2_5
出版状态	已出版 - 2022
活动	31st International Conference on Artificial Neural Networks, ICANN 2022 - Bristol, 英国期限: 6 9月 2022 → 9 9月 2022

出版系列

姓名	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
卷	13530 LNCS
ISSN（印刷版）	0302-9743
ISSN（电子版）	1611-3349

会议

会议	31st International Conference on Artificial Neural Networks, ICANN 2022
国家/地区	英国
市	Bristol
时期	6/09/22 → 9/09/22

访问文件

10.1007/978-3-031-15931-2_5

其它文件与链接

链接到 Scopus 的出版物

引用此

Cao, H., Wei, Q., Zheng, J., & Shi, Y. (2022). Model-Based Offline Adaptive Policy Optimization with Episodic Memory. 在 E. Pimenidis, M. Aydin, P. Angelov, C. Jayne, & A. Papaleonidas (编辑), Artificial Neural Networks and Machine Learning - ICANN 2022 - 31st International Conference on Artificial Neural Networks, Proceedings (页码 50-62). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 卷 13530 LNCS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-15931-2_5

Cao, Hongye ; Wei, Qianru ; Zheng, Jiangbin 等. / Model-Based Offline Adaptive Policy Optimization with Episodic Memory. Artificial Neural Networks and Machine Learning - ICANN 2022 - 31st International Conference on Artificial Neural Networks, Proceedings. 编辑 / Elias Pimenidis ; Mehmet Aydin ; Plamen Angelov ; Chrisina Jayne ; Antonios Papaleonidas. Springer Science and Business Media Deutschland GmbH, 2022. 页码 50-62 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{d1bedf5e18df4b35a312c669f279a974,

title = "Model-Based Offline Adaptive Policy Optimization with Episodic Memory",

abstract = "Offline reinforcement learning (RL) is a promising direction to apply RL to real-world by avoiding online expensive and dangerous exploration. However, offline RL is challenging due to extrapolation errors caused by the distribution shift between offline datasets and states visited by behavior policy. Existing model-based offline RL methods set pessimistic constraints of the learned model within the support region of the offline data to avoid extrapolation errors, but these approaches limit the generalization potential of the policy in out-of-distribution (OOD) region. The artificial fixed uncertainty calculation and the sparse reward problem of low-quality datasets in existing methods have weak adaptability to different learning tasks. Hence, a model-based offline adaptive policy optimization with episodic memory is proposed in this work to improve generalization of the policy. Inspired by active learning, constraint strength is proposed to trade off the return and risk adaptively to balance the robustness and generalization ability of the policy. Further, episodic memory is applied to capture successful experience to improve adaptability. Extensive experiments on D4RL datasets demonstrate that the proposed method outperforms existing state-of-the-art methods and achieves superior performance on challenging tasks requiring OOD generalization.",

keywords = "Constraint strength, Episodic memory, Offline reinforcement learning",

author = "Hongye Cao and Qianru Wei and Jiangbin Zheng and Yanqing Shi",

note = "Publisher Copyright: {\textcopyright} 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.; 31st International Conference on Artificial Neural Networks, ICANN 2022 ; Conference date: 06-09-2022 Through 09-09-2022",

year = "2022",

doi = "10.1007/978-3-031-15931-2_5",

language = "英语",

isbn = "9783031159305",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "50--62",

editor = "Elias Pimenidis and Mehmet Aydin and Plamen Angelov and Chrisina Jayne and Antonios Papaleonidas",

booktitle = "Artificial Neural Networks and Machine Learning - ICANN 2022 - 31st International Conference on Artificial Neural Networks, Proceedings",

}

Cao, H, Wei, Q, Zheng, J & Shi, Y 2022, Model-Based Offline Adaptive Policy Optimization with Episodic Memory. 在 E Pimenidis, M Aydin, P Angelov, C Jayne & A Papaleonidas (编辑), Artificial Neural Networks and Machine Learning - ICANN 2022 - 31st International Conference on Artificial Neural Networks, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 卷 13530 LNCS, Springer Science and Business Media Deutschland GmbH, 页码 50-62, 31st International Conference on Artificial Neural Networks, ICANN 2022, Bristol, 英国, 6/09/22. https://doi.org/10.1007/978-3-031-15931-2_5

Model-Based Offline Adaptive Policy Optimization with Episodic Memory. / Cao, Hongye; Wei, Qianru; Zheng, Jiangbin 等.
Artificial Neural Networks and Machine Learning - ICANN 2022 - 31st International Conference on Artificial Neural Networks, Proceedings. 编辑 / Elias Pimenidis; Mehmet Aydin; Plamen Angelov; Chrisina Jayne; Antonios Papaleonidas. Springer Science and Business Media Deutschland GmbH, 2022. 页码 50-62 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 卷 13530 LNCS).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Model-Based Offline Adaptive Policy Optimization with Episodic Memory

AU - Cao, Hongye

AU - Wei, Qianru

AU - Zheng, Jiangbin

AU - Shi, Yanqing

PY - 2022

Y1 - 2022

N2 - Offline reinforcement learning (RL) is a promising direction to apply RL to real-world by avoiding online expensive and dangerous exploration. However, offline RL is challenging due to extrapolation errors caused by the distribution shift between offline datasets and states visited by behavior policy. Existing model-based offline RL methods set pessimistic constraints of the learned model within the support region of the offline data to avoid extrapolation errors, but these approaches limit the generalization potential of the policy in out-of-distribution (OOD) region. The artificial fixed uncertainty calculation and the sparse reward problem of low-quality datasets in existing methods have weak adaptability to different learning tasks. Hence, a model-based offline adaptive policy optimization with episodic memory is proposed in this work to improve generalization of the policy. Inspired by active learning, constraint strength is proposed to trade off the return and risk adaptively to balance the robustness and generalization ability of the policy. Further, episodic memory is applied to capture successful experience to improve adaptability. Extensive experiments on D4RL datasets demonstrate that the proposed method outperforms existing state-of-the-art methods and achieves superior performance on challenging tasks requiring OOD generalization.

AB - Offline reinforcement learning (RL) is a promising direction to apply RL to real-world by avoiding online expensive and dangerous exploration. However, offline RL is challenging due to extrapolation errors caused by the distribution shift between offline datasets and states visited by behavior policy. Existing model-based offline RL methods set pessimistic constraints of the learned model within the support region of the offline data to avoid extrapolation errors, but these approaches limit the generalization potential of the policy in out-of-distribution (OOD) region. The artificial fixed uncertainty calculation and the sparse reward problem of low-quality datasets in existing methods have weak adaptability to different learning tasks. Hence, a model-based offline adaptive policy optimization with episodic memory is proposed in this work to improve generalization of the policy. Inspired by active learning, constraint strength is proposed to trade off the return and risk adaptively to balance the robustness and generalization ability of the policy. Further, episodic memory is applied to capture successful experience to improve adaptability. Extensive experiments on D4RL datasets demonstrate that the proposed method outperforms existing state-of-the-art methods and achieves superior performance on challenging tasks requiring OOD generalization.

KW - Constraint strength

KW - Episodic memory

KW - Offline reinforcement learning

UR - http://www.scopus.com/inward/record.url?scp=85138690595&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-15931-2_5

DO - 10.1007/978-3-031-15931-2_5

M3 - 会议稿件

AN - SCOPUS:85138690595

SN - 9783031159305

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 50

EP - 62

BT - Artificial Neural Networks and Machine Learning - ICANN 2022 - 31st International Conference on Artificial Neural Networks, Proceedings

A2 - Pimenidis, Elias

A2 - Aydin, Mehmet

A2 - Angelov, Plamen

A2 - Jayne, Chrisina

A2 - Papaleonidas, Antonios

PB - Springer Science and Business Media Deutschland GmbH

T2 - 31st International Conference on Artificial Neural Networks, ICANN 2022

Y2 - 6 September 2022 through 9 September 2022

ER -

Cao H, Wei Q, Zheng J, Shi Y. Model-Based Offline Adaptive Policy Optimization with Episodic Memory. 在 Pimenidis E, Aydin M, Angelov P, Jayne C, Papaleonidas A, 编辑, Artificial Neural Networks and Machine Learning - ICANN 2022 - 31st International Conference on Artificial Neural Networks, Proceedings. Springer Science and Business Media Deutschland GmbH. 2022. 页码 50-62. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-031-15931-2_5

Model-Based Offline Adaptive Policy Optimization with Episodic Memory

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此