Hybrid Reinforcement Learning based on Human Preference and Advice for Efficient Robot Skill Learning

Bingqian Li, Xing Liu, Zhengxiong Liu, Panfeng Huang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The key to realize the application of robots in real world is to design intelligent robots with certain autonomous skill learning ability. Reinforcement learning is a feasible solution. However, two important challenges limit the application of RL methods in robotics, including the difficulty of human-designed reward as well as long training time. Therefore, we study hybrid RL methods, which use human knowledge to assist agent learning. First, we propose a reward learning method based on human preference model to realize robot skill learning, which has better robustness and convergence than the traditional RL method with human-designed reward. Then, we combine it with Episode-Fuzzy-COACH, our previous work, to build a hybrid RL method based on human preference and advice. In this method, preference model is used to infer reward function and human advice is used to speed up the policy learning process. It realizes efficient robot skill learning without human-designed reward function. And it is proven the learning efficiency of this method is 73.3% higher than that of the reward learning method that only uses preference model.

Original languageEnglish
Title of host publicationICARM 2024 - 2024 9th IEEE International Conference on Advanced Robotics and Mechatronics
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages655-661
Number of pages7
ISBN (Electronic)9798350385724
DOIs
StatePublished - 2024
Event9th IEEE International Conference on Advanced Robotics and Mechatronics, ICARM 2024 - Tokyo, Japan
Duration: 8 Jul 202410 Jul 2024

Publication series

NameICARM 2024 - 2024 9th IEEE International Conference on Advanced Robotics and Mechatronics

Conference

Conference9th IEEE International Conference on Advanced Robotics and Mechatronics, ICARM 2024
Country/TerritoryJapan
CityTokyo
Period8/07/2410/07/24

Fingerprint

Dive into the research topics of 'Hybrid Reinforcement Learning based on Human Preference and Advice for Efficient Robot Skill Learning'. Together they form a unique fingerprint.

Cite this