Hybrid Reinforcement Learning based on Human Preference and Advice for Efficient Robot Skill Learning

Bingqian Li; Xing Liu; Zhengxiong Liu; Panfeng Huang

doi:10.1109/ICARM62033.2024.10715884

Hybrid Reinforcement Learning based on Human Preference and Advice for Efficient Robot Skill Learning

Bingqian Li, Xing Liu, Zhengxiong Liu, Panfeng Huang

School of Astronautics

Northwestern Polytechnical University Xian

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

The key to realize the application of robots in real world is to design intelligent robots with certain autonomous skill learning ability. Reinforcement learning is a feasible solution. However, two important challenges limit the application of RL methods in robotics, including the difficulty of human-designed reward as well as long training time. Therefore, we study hybrid RL methods, which use human knowledge to assist agent learning. First, we propose a reward learning method based on human preference model to realize robot skill learning, which has better robustness and convergence than the traditional RL method with human-designed reward. Then, we combine it with Episode-Fuzzy-COACH, our previous work, to build a hybrid RL method based on human preference and advice. In this method, preference model is used to infer reward function and human advice is used to speed up the policy learning process. It realizes efficient robot skill learning without human-designed reward function. And it is proven the learning efficiency of this method is 73.3% higher than that of the reward learning method that only uses preference model.

Original language	English
Title of host publication	ICARM 2024 - 2024 9th IEEE International Conference on Advanced Robotics and Mechatronics
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	655-661
Number of pages	7
ISBN (Electronic)	9798350385724
DOIs	https://doi.org/10.1109/ICARM62033.2024.10715884
State	Published - 2024
Event	9th IEEE International Conference on Advanced Robotics and Mechatronics, ICARM 2024 - Tokyo, Japan Duration: 8 Jul 2024 → 10 Jul 2024

Publication series

Name	ICARM 2024 - 2024 9th IEEE International Conference on Advanced Robotics and Mechatronics

Conference

Conference	9th IEEE International Conference on Advanced Robotics and Mechatronics, ICARM 2024
Country/Territory	Japan
City	Tokyo
Period	8/07/24 → 10/07/24

Access to Document

10.1109/ICARM62033.2024.10715884

Cite this

Li, B., Liu, X., Liu, Z., & Huang, P. (2024). Hybrid Reinforcement Learning based on Human Preference and Advice for Efficient Robot Skill Learning. In ICARM 2024 - 2024 9th IEEE International Conference on Advanced Robotics and Mechatronics (pp. 655-661). (ICARM 2024 - 2024 9th IEEE International Conference on Advanced Robotics and Mechatronics). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICARM62033.2024.10715884

Li, Bingqian ; Liu, Xing ; Liu, Zhengxiong et al. / Hybrid Reinforcement Learning based on Human Preference and Advice for Efficient Robot Skill Learning. ICARM 2024 - 2024 9th IEEE International Conference on Advanced Robotics and Mechatronics. Institute of Electrical and Electronics Engineers Inc., 2024. pp. 655-661 (ICARM 2024 - 2024 9th IEEE International Conference on Advanced Robotics and Mechatronics).

@inproceedings{92c9a729ac804803825d33039292f093,

title = "Hybrid Reinforcement Learning based on Human Preference and Advice for Efficient Robot Skill Learning",

abstract = "The key to realize the application of robots in real world is to design intelligent robots with certain autonomous skill learning ability. Reinforcement learning is a feasible solution. However, two important challenges limit the application of RL methods in robotics, including the difficulty of human-designed reward as well as long training time. Therefore, we study hybrid RL methods, which use human knowledge to assist agent learning. First, we propose a reward learning method based on human preference model to realize robot skill learning, which has better robustness and convergence than the traditional RL method with human-designed reward. Then, we combine it with Episode-Fuzzy-COACH, our previous work, to build a hybrid RL method based on human preference and advice. In this method, preference model is used to infer reward function and human advice is used to speed up the policy learning process. It realizes efficient robot skill learning without human-designed reward function. And it is proven the learning efficiency of this method is 73.3% higher than that of the reward learning method that only uses preference model.",

author = "Bingqian Li and Xing Liu and Zhengxiong Liu and Panfeng Huang",

note = "Publisher Copyright: {\textcopyright} 2024 IEEE.; 9th IEEE International Conference on Advanced Robotics and Mechatronics, ICARM 2024 ; Conference date: 08-07-2024 Through 10-07-2024",

year = "2024",

doi = "10.1109/ICARM62033.2024.10715884",

language = "英语",

series = "ICARM 2024 - 2024 9th IEEE International Conference on Advanced Robotics and Mechatronics",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "655--661",

booktitle = "ICARM 2024 - 2024 9th IEEE International Conference on Advanced Robotics and Mechatronics",

}

Li, B, Liu, X, Liu, Z & Huang, P 2024, Hybrid Reinforcement Learning based on Human Preference and Advice for Efficient Robot Skill Learning. in ICARM 2024 - 2024 9th IEEE International Conference on Advanced Robotics and Mechatronics. ICARM 2024 - 2024 9th IEEE International Conference on Advanced Robotics and Mechatronics, Institute of Electrical and Electronics Engineers Inc., pp. 655-661, 9th IEEE International Conference on Advanced Robotics and Mechatronics, ICARM 2024, Tokyo, Japan, 8/07/24. https://doi.org/10.1109/ICARM62033.2024.10715884

Hybrid Reinforcement Learning based on Human Preference and Advice for Efficient Robot Skill Learning. / Li, Bingqian; Liu, Xing; Liu, Zhengxiong et al.
ICARM 2024 - 2024 9th IEEE International Conference on Advanced Robotics and Mechatronics. Institute of Electrical and Electronics Engineers Inc., 2024. p. 655-661 (ICARM 2024 - 2024 9th IEEE International Conference on Advanced Robotics and Mechatronics).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Hybrid Reinforcement Learning based on Human Preference and Advice for Efficient Robot Skill Learning

AU - Li, Bingqian

AU - Liu, Xing

AU - Liu, Zhengxiong

AU - Huang, Panfeng

PY - 2024

Y1 - 2024

N2 - The key to realize the application of robots in real world is to design intelligent robots with certain autonomous skill learning ability. Reinforcement learning is a feasible solution. However, two important challenges limit the application of RL methods in robotics, including the difficulty of human-designed reward as well as long training time. Therefore, we study hybrid RL methods, which use human knowledge to assist agent learning. First, we propose a reward learning method based on human preference model to realize robot skill learning, which has better robustness and convergence than the traditional RL method with human-designed reward. Then, we combine it with Episode-Fuzzy-COACH, our previous work, to build a hybrid RL method based on human preference and advice. In this method, preference model is used to infer reward function and human advice is used to speed up the policy learning process. It realizes efficient robot skill learning without human-designed reward function. And it is proven the learning efficiency of this method is 73.3% higher than that of the reward learning method that only uses preference model.

AB - The key to realize the application of robots in real world is to design intelligent robots with certain autonomous skill learning ability. Reinforcement learning is a feasible solution. However, two important challenges limit the application of RL methods in robotics, including the difficulty of human-designed reward as well as long training time. Therefore, we study hybrid RL methods, which use human knowledge to assist agent learning. First, we propose a reward learning method based on human preference model to realize robot skill learning, which has better robustness and convergence than the traditional RL method with human-designed reward. Then, we combine it with Episode-Fuzzy-COACH, our previous work, to build a hybrid RL method based on human preference and advice. In this method, preference model is used to infer reward function and human advice is used to speed up the policy learning process. It realizes efficient robot skill learning without human-designed reward function. And it is proven the learning efficiency of this method is 73.3% higher than that of the reward learning method that only uses preference model.

UR - http://www.scopus.com/inward/record.url?scp=85208034704&partnerID=8YFLogxK

U2 - 10.1109/ICARM62033.2024.10715884

DO - 10.1109/ICARM62033.2024.10715884

M3 - 会议稿件

AN - SCOPUS:85208034704

T3 - ICARM 2024 - 2024 9th IEEE International Conference on Advanced Robotics and Mechatronics

SP - 655

EP - 661

BT - ICARM 2024 - 2024 9th IEEE International Conference on Advanced Robotics and Mechatronics

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 9th IEEE International Conference on Advanced Robotics and Mechatronics, ICARM 2024

Y2 - 8 July 2024 through 10 July 2024

ER -

Li B, Liu X, Liu Z, Huang P. Hybrid Reinforcement Learning based on Human Preference and Advice for Efficient Robot Skill Learning. In ICARM 2024 - 2024 9th IEEE International Conference on Advanced Robotics and Mechatronics. Institute of Electrical and Electronics Engineers Inc. 2024. p. 655-661. (ICARM 2024 - 2024 9th IEEE International Conference on Advanced Robotics and Mechatronics). doi: 10.1109/ICARM62033.2024.10715884

Hybrid Reinforcement Learning based on Human Preference and Advice for Efficient Robot Skill Learning

Abstract

Publication series

Conference

Access to Document

Other files and links

Fingerprint

Cite this