Model-Based Offline Adaptive Policy Optimization with Episodic Memory

Hongye Cao, Qianru Wei, Jiangbin Zheng, Yanqing Shi

科研成果: 书/报告/会议事项章节会议稿件同行评审

2 引用 (Scopus)

摘要

Offline reinforcement learning (RL) is a promising direction to apply RL to real-world by avoiding online expensive and dangerous exploration. However, offline RL is challenging due to extrapolation errors caused by the distribution shift between offline datasets and states visited by behavior policy. Existing model-based offline RL methods set pessimistic constraints of the learned model within the support region of the offline data to avoid extrapolation errors, but these approaches limit the generalization potential of the policy in out-of-distribution (OOD) region. The artificial fixed uncertainty calculation and the sparse reward problem of low-quality datasets in existing methods have weak adaptability to different learning tasks. Hence, a model-based offline adaptive policy optimization with episodic memory is proposed in this work to improve generalization of the policy. Inspired by active learning, constraint strength is proposed to trade off the return and risk adaptively to balance the robustness and generalization ability of the policy. Further, episodic memory is applied to capture successful experience to improve adaptability. Extensive experiments on D4RL datasets demonstrate that the proposed method outperforms existing state-of-the-art methods and achieves superior performance on challenging tasks requiring OOD generalization.

源语言英语
主期刊名Artificial Neural Networks and Machine Learning - ICANN 2022 - 31st International Conference on Artificial Neural Networks, Proceedings
编辑Elias Pimenidis, Mehmet Aydin, Plamen Angelov, Chrisina Jayne, Antonios Papaleonidas
出版商Springer Science and Business Media Deutschland GmbH
50-62
页数13
ISBN(印刷版)9783031159305
DOI
出版状态已出版 - 2022
活动31st International Conference on Artificial Neural Networks, ICANN 2022 - Bristol, 英国
期限: 6 9月 20229 9月 2022

出版系列

姓名Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
13530 LNCS
ISSN(印刷版)0302-9743
ISSN(电子版)1611-3349

会议

会议31st International Conference on Artificial Neural Networks, ICANN 2022
国家/地区英国
Bristol
时期6/09/229/09/22

指纹

探究 'Model-Based Offline Adaptive Policy Optimization with Episodic Memory' 的科研主题。它们共同构成独一无二的指纹。

引用此