Efficient deep reinforcement learning through policy transfer

Tianpei Yang, Jianye Hao, Zhaopeng Meng, Zongzhang Zhang, Yujing Hu, Yingfeng Chen, Changjie Fan, Weixun Wang, Zhaodong Wang, Jiajie Peng

科研成果: 书/报告/会议事项章节会议稿件同行评审

6 引用 (Scopus)

摘要

Transfer Learning (TL) has shown great potential to accelerate Reinforcement Learning (RL) by leveraging prior knowledge from past learned policies of relevant tasks. Existing TL approaches either explicitly computes the similarity between tasks or select appropriate source policies to provide guided explorations for the target task. However, how to directly optimize the target policy by alternatively utilizing knowledge from appropriate source policies without explicitly measuring the similarity is currently missing. In this paper, we propose a novel Policy Transfer Framework (PTF) by taking advantage of this idea. PTF learns when and which source policy is the best to reuse for the target policy and when to terminate it by modeling multi-policy transfer as the option learning problem. PTF can be easily combined with existing deep RL approaches. Experimental results show it significantly accelerates the learning process and outperforms state-of-the-art policy transfer methods in both discrete and continuous action spaces.

源语言英语
主期刊名Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2020
编辑Bo An, Amal El Fallah Seghrouchni, Gita Sukthankar
出版商International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS)
2053-2055
页数3
ISBN(电子版)9781450375184
出版状态已出版 - 2020
活动19th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2020 - Virtual, Auckland, 新西兰
期限: 19 5月 2020 → …

出版系列

姓名Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS
2020-May
ISSN(印刷版)1548-8403
ISSN(电子版)1558-2914

会议

会议19th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2020
国家/地区新西兰
Virtual, Auckland
时期19/05/20 → …

指纹

探究 'Efficient deep reinforcement learning through policy transfer' 的科研主题。它们共同构成独一无二的指纹。

引用此