Hybrid Reinforcement Learning based on Human Preference and Advice for Efficient Robot Skill Learning

Bingqian Li; Xing Liu; Zhengxiong Liu; Panfeng Huang

doi:10.1109/ICARM62033.2024.10715884

Hybrid Reinforcement Learning based on Human Preference and Advice for Efficient Robot Skill Learning

Bingqian Li, Xing Liu, Zhengxiong Liu, Panfeng Huang

航天学院

Northwestern Polytechnical University Xian

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

摘要

The key to realize the application of robots in real world is to design intelligent robots with certain autonomous skill learning ability. Reinforcement learning is a feasible solution. However, two important challenges limit the application of RL methods in robotics, including the difficulty of human-designed reward as well as long training time. Therefore, we study hybrid RL methods, which use human knowledge to assist agent learning. First, we propose a reward learning method based on human preference model to realize robot skill learning, which has better robustness and convergence than the traditional RL method with human-designed reward. Then, we combine it with Episode-Fuzzy-COACH, our previous work, to build a hybrid RL method based on human preference and advice. In this method, preference model is used to infer reward function and human advice is used to speed up the policy learning process. It realizes efficient robot skill learning without human-designed reward function. And it is proven the learning efficiency of this method is 73.3% higher than that of the reward learning method that only uses preference model.

源语言	英语
主期刊名	ICARM 2024 - 2024 9th IEEE International Conference on Advanced Robotics and Mechatronics
出版商	Institute of Electrical and Electronics Engineers Inc.
页	655-661
页数	7
ISBN（电子版）	9798350385724
DOI	https://doi.org/10.1109/ICARM62033.2024.10715884
出版状态	已出版 - 2024
活动	9th IEEE International Conference on Advanced Robotics and Mechatronics, ICARM 2024 - Tokyo, 日本期限: 8 7月 2024 → 10 7月 2024

出版系列

姓名	ICARM 2024 - 2024 9th IEEE International Conference on Advanced Robotics and Mechatronics

会议

会议	9th IEEE International Conference on Advanced Robotics and Mechatronics, ICARM 2024
国家/地区	日本
市	Tokyo
时期	8/07/24 → 10/07/24

访问文件

10.1109/ICARM62033.2024.10715884

其它文件与链接

链接到 Scopus 的出版物

引用此

Li, B., Liu, X., Liu, Z., & Huang, P. (2024). Hybrid Reinforcement Learning based on Human Preference and Advice for Efficient Robot Skill Learning. 在 ICARM 2024 - 2024 9th IEEE International Conference on Advanced Robotics and Mechatronics (页码 655-661). (ICARM 2024 - 2024 9th IEEE International Conference on Advanced Robotics and Mechatronics). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICARM62033.2024.10715884

Li, Bingqian ; Liu, Xing ; Liu, Zhengxiong 等. / Hybrid Reinforcement Learning based on Human Preference and Advice for Efficient Robot Skill Learning. ICARM 2024 - 2024 9th IEEE International Conference on Advanced Robotics and Mechatronics. Institute of Electrical and Electronics Engineers Inc., 2024. 页码 655-661 (ICARM 2024 - 2024 9th IEEE International Conference on Advanced Robotics and Mechatronics).

@inproceedings{92c9a729ac804803825d33039292f093,

title = "Hybrid Reinforcement Learning based on Human Preference and Advice for Efficient Robot Skill Learning",

abstract = "The key to realize the application of robots in real world is to design intelligent robots with certain autonomous skill learning ability. Reinforcement learning is a feasible solution. However, two important challenges limit the application of RL methods in robotics, including the difficulty of human-designed reward as well as long training time. Therefore, we study hybrid RL methods, which use human knowledge to assist agent learning. First, we propose a reward learning method based on human preference model to realize robot skill learning, which has better robustness and convergence than the traditional RL method with human-designed reward. Then, we combine it with Episode-Fuzzy-COACH, our previous work, to build a hybrid RL method based on human preference and advice. In this method, preference model is used to infer reward function and human advice is used to speed up the policy learning process. It realizes efficient robot skill learning without human-designed reward function. And it is proven the learning efficiency of this method is 73.3% higher than that of the reward learning method that only uses preference model.",

author = "Bingqian Li and Xing Liu and Zhengxiong Liu and Panfeng Huang",

note = "Publisher Copyright: {\textcopyright} 2024 IEEE.; 9th IEEE International Conference on Advanced Robotics and Mechatronics, ICARM 2024 ; Conference date: 08-07-2024 Through 10-07-2024",

year = "2024",

doi = "10.1109/ICARM62033.2024.10715884",

language = "英语",

series = "ICARM 2024 - 2024 9th IEEE International Conference on Advanced Robotics and Mechatronics",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "655--661",

booktitle = "ICARM 2024 - 2024 9th IEEE International Conference on Advanced Robotics and Mechatronics",

}

Li, B, Liu, X, Liu, Z & Huang, P 2024, Hybrid Reinforcement Learning based on Human Preference and Advice for Efficient Robot Skill Learning. 在 ICARM 2024 - 2024 9th IEEE International Conference on Advanced Robotics and Mechatronics. ICARM 2024 - 2024 9th IEEE International Conference on Advanced Robotics and Mechatronics, Institute of Electrical and Electronics Engineers Inc., 页码 655-661, 9th IEEE International Conference on Advanced Robotics and Mechatronics, ICARM 2024, Tokyo, 日本, 8/07/24. https://doi.org/10.1109/ICARM62033.2024.10715884

Hybrid Reinforcement Learning based on Human Preference and Advice for Efficient Robot Skill Learning. / Li, Bingqian; Liu, Xing; Liu, Zhengxiong 等.
ICARM 2024 - 2024 9th IEEE International Conference on Advanced Robotics and Mechatronics. Institute of Electrical and Electronics Engineers Inc., 2024. 页码 655-661 (ICARM 2024 - 2024 9th IEEE International Conference on Advanced Robotics and Mechatronics).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Hybrid Reinforcement Learning based on Human Preference and Advice for Efficient Robot Skill Learning

AU - Li, Bingqian

AU - Liu, Xing

AU - Liu, Zhengxiong

AU - Huang, Panfeng

PY - 2024

Y1 - 2024

N2 - The key to realize the application of robots in real world is to design intelligent robots with certain autonomous skill learning ability. Reinforcement learning is a feasible solution. However, two important challenges limit the application of RL methods in robotics, including the difficulty of human-designed reward as well as long training time. Therefore, we study hybrid RL methods, which use human knowledge to assist agent learning. First, we propose a reward learning method based on human preference model to realize robot skill learning, which has better robustness and convergence than the traditional RL method with human-designed reward. Then, we combine it with Episode-Fuzzy-COACH, our previous work, to build a hybrid RL method based on human preference and advice. In this method, preference model is used to infer reward function and human advice is used to speed up the policy learning process. It realizes efficient robot skill learning without human-designed reward function. And it is proven the learning efficiency of this method is 73.3% higher than that of the reward learning method that only uses preference model.

AB - The key to realize the application of robots in real world is to design intelligent robots with certain autonomous skill learning ability. Reinforcement learning is a feasible solution. However, two important challenges limit the application of RL methods in robotics, including the difficulty of human-designed reward as well as long training time. Therefore, we study hybrid RL methods, which use human knowledge to assist agent learning. First, we propose a reward learning method based on human preference model to realize robot skill learning, which has better robustness and convergence than the traditional RL method with human-designed reward. Then, we combine it with Episode-Fuzzy-COACH, our previous work, to build a hybrid RL method based on human preference and advice. In this method, preference model is used to infer reward function and human advice is used to speed up the policy learning process. It realizes efficient robot skill learning without human-designed reward function. And it is proven the learning efficiency of this method is 73.3% higher than that of the reward learning method that only uses preference model.

UR - http://www.scopus.com/inward/record.url?scp=85208034704&partnerID=8YFLogxK

U2 - 10.1109/ICARM62033.2024.10715884

DO - 10.1109/ICARM62033.2024.10715884

M3 - 会议稿件

AN - SCOPUS:85208034704

T3 - ICARM 2024 - 2024 9th IEEE International Conference on Advanced Robotics and Mechatronics

SP - 655

EP - 661

BT - ICARM 2024 - 2024 9th IEEE International Conference on Advanced Robotics and Mechatronics

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 9th IEEE International Conference on Advanced Robotics and Mechatronics, ICARM 2024

Y2 - 8 July 2024 through 10 July 2024

ER -

Li B, Liu X, Liu Z, Huang P. Hybrid Reinforcement Learning based on Human Preference and Advice for Efficient Robot Skill Learning. 在 ICARM 2024 - 2024 9th IEEE International Conference on Advanced Robotics and Mechatronics. Institute of Electrical and Electronics Engineers Inc. 2024. 页码 655-661. (ICARM 2024 - 2024 9th IEEE International Conference on Advanced Robotics and Mechatronics). doi: 10.1109/ICARM62033.2024.10715884

Hybrid Reinforcement Learning based on Human Preference and Advice for Efficient Robot Skill Learning

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此