Cross-Domain Policy Adaptation via Value-Guided Data Filtering

Kang Xu, Chenjia Bai, Xiaoteng Ma, Dong Wang, Bin Zhao, Zhen Wang, Xuelong Li, Wei Li

科研成果: 期刊稿件会议文章同行评审

7 引用 (Scopus)

摘要

Generalizing policies across different domains with dynamics mismatch poses a significant challenge in reinforcement learning. For example, a robot learns the policy in a simulator, but when it is deployed in the real world, the dynamics of the environment may be different. Given the source and target domain with dynamics mismatch, we consider the online dynamics adaptation problem, in which case the agent can access sufficient source domain data while online interactions with the target domain are limited. Existing research has attempted to solve the problem from the dynamics discrepancy perspective. In this work, we reveal the limitations of these methods and explore the problem from the value difference perspective via a novel insight on the value consistency across domains. Specifically, we present the Value-Guided Data Filtering (VGDF) algorithm, which selectively shares transitions from the source domain based on the proximity of paired value targets across the two domains. Empirical results on various environments with kinematic and morphology shifts demonstrate that our method achieves superior performance compared to prior approaches.

源语言英语
期刊Advances in Neural Information Processing Systems
36
出版状态已出版 - 2023
活动37th Conference on Neural Information Processing Systems, NeurIPS 2023 - New Orleans, 美国
期限: 10 12月 202316 12月 2023

指纹

探究 'Cross-Domain Policy Adaptation via Value-Guided Data Filtering' 的科研主题。它们共同构成独一无二的指纹。

引用此