TY - JOUR
T1 - Pessimistic value iteration for multi-task data sharing in Offline Reinforcement Learning
AU - Bai, Chenjia
AU - Wang, Lingxiao
AU - Hao, Jianye
AU - Yang, Zhuoran
AU - Zhao, Bin
AU - Wang, Zhen
AU - Li, Xuelong
N1 - Publisher Copyright:
© 2023 Elsevier B.V.
PY - 2024/1
Y1 - 2024/1
N2 - Offline Reinforcement Learning (RL) has shown promising results in learning a task-specific policy from a fixed dataset. However, successful offline RL often relies heavily on the coverage and quality of the given dataset. In scenarios where the dataset for a specific task is limited, a natural approach is to improve offline RL with datasets from other tasks, namely, to conduct Multi-Task Data Sharing (MTDS). Nevertheless, directly sharing datasets from other tasks exacerbates the distribution shift in offline RL. In this paper, we propose an uncertainty-based MTDS approach that shares the entire dataset without data selection. Given ensemble-based uncertainty quantification, we perform pessimistic value iteration on the shared offline dataset, which provides a unified framework for single- and multi-task offline RL. We further provide theoretical analysis, which shows that the optimality gap of our method is only related to the expected data coverage of the shared dataset, thus resolving the distribution shift issue in data sharing. Empirically, we release an MTDS benchmark and collect datasets from three challenging domains. The experimental results show our algorithm outperforms the previous state-of-the-art methods in challenging MTDS problems.
AB - Offline Reinforcement Learning (RL) has shown promising results in learning a task-specific policy from a fixed dataset. However, successful offline RL often relies heavily on the coverage and quality of the given dataset. In scenarios where the dataset for a specific task is limited, a natural approach is to improve offline RL with datasets from other tasks, namely, to conduct Multi-Task Data Sharing (MTDS). Nevertheless, directly sharing datasets from other tasks exacerbates the distribution shift in offline RL. In this paper, we propose an uncertainty-based MTDS approach that shares the entire dataset without data selection. Given ensemble-based uncertainty quantification, we perform pessimistic value iteration on the shared offline dataset, which provides a unified framework for single- and multi-task offline RL. We further provide theoretical analysis, which shows that the optimality gap of our method is only related to the expected data coverage of the shared dataset, thus resolving the distribution shift issue in data sharing. Empirically, we release an MTDS benchmark and collect datasets from three challenging domains. The experimental results show our algorithm outperforms the previous state-of-the-art methods in challenging MTDS problems.
KW - Data sharing
KW - Offline Reinforcement Learning
KW - Pessimistic value iteration
KW - Uncertainty quantification
UR - http://www.scopus.com/inward/record.url?scp=85177617269&partnerID=8YFLogxK
U2 - 10.1016/j.artint.2023.104048
DO - 10.1016/j.artint.2023.104048
M3 - 文章
AN - SCOPUS:85177617269
SN - 0004-3702
VL - 326
JO - Artificial Intelligence
JF - Artificial Intelligence
M1 - 104048
ER -