TY - GEN
T1 - Hybrid Reinforcement Learning based on Human Preference and Advice for Efficient Robot Skill Learning
AU - Li, Bingqian
AU - Liu, Xing
AU - Liu, Zhengxiong
AU - Huang, Panfeng
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - The key to realize the application of robots in real world is to design intelligent robots with certain autonomous skill learning ability. Reinforcement learning is a feasible solution. However, two important challenges limit the application of RL methods in robotics, including the difficulty of human-designed reward as well as long training time. Therefore, we study hybrid RL methods, which use human knowledge to assist agent learning. First, we propose a reward learning method based on human preference model to realize robot skill learning, which has better robustness and convergence than the traditional RL method with human-designed reward. Then, we combine it with Episode-Fuzzy-COACH, our previous work, to build a hybrid RL method based on human preference and advice. In this method, preference model is used to infer reward function and human advice is used to speed up the policy learning process. It realizes efficient robot skill learning without human-designed reward function. And it is proven the learning efficiency of this method is 73.3% higher than that of the reward learning method that only uses preference model.
AB - The key to realize the application of robots in real world is to design intelligent robots with certain autonomous skill learning ability. Reinforcement learning is a feasible solution. However, two important challenges limit the application of RL methods in robotics, including the difficulty of human-designed reward as well as long training time. Therefore, we study hybrid RL methods, which use human knowledge to assist agent learning. First, we propose a reward learning method based on human preference model to realize robot skill learning, which has better robustness and convergence than the traditional RL method with human-designed reward. Then, we combine it with Episode-Fuzzy-COACH, our previous work, to build a hybrid RL method based on human preference and advice. In this method, preference model is used to infer reward function and human advice is used to speed up the policy learning process. It realizes efficient robot skill learning without human-designed reward function. And it is proven the learning efficiency of this method is 73.3% higher than that of the reward learning method that only uses preference model.
UR - http://www.scopus.com/inward/record.url?scp=85208034704&partnerID=8YFLogxK
U2 - 10.1109/ICARM62033.2024.10715884
DO - 10.1109/ICARM62033.2024.10715884
M3 - 会议稿件
AN - SCOPUS:85208034704
T3 - ICARM 2024 - 2024 9th IEEE International Conference on Advanced Robotics and Mechatronics
SP - 655
EP - 661
BT - ICARM 2024 - 2024 9th IEEE International Conference on Advanced Robotics and Mechatronics
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 9th IEEE International Conference on Advanced Robotics and Mechatronics, ICARM 2024
Y2 - 8 July 2024 through 10 July 2024
ER -