TY - GEN
T1 - Utilizing Large Language Models for Robot Skill Reward Shaping in Reinforcement Learning
AU - Guo, Qi
AU - Liu, Xing
AU - Hui, Jianjiang
AU - Liu, Zhengxiong
AU - Huang, Panfeng
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
PY - 2025
Y1 - 2025
N2 - In this paper, we examines the integration of LLMs in designing reward functions for reinforcement learning (RL) to enhance robotic applications with minimal human input. In RL, the reward function is pivotal, guiding the agent’s learning trajectory by evaluating the desirability of behaviors within specific environments. Traditional reward functions, often sparse, lead to slow convergence as agents require extensive interactions to learn effectively. By leveraging LLM’s ability to generate code from task semantics, we propose a new method that reduces the complexity of reward design, allowing even non-experts to create effective reward policies using semantic prompts. We utilize the Soft Actor-Critic (SAC) algorithm, known for its efficiency and stability, to train agents under these conditions. To validate the efficacy of our method, we compare it with traditional techniques like Trajectory-ranked reward extrapolation (T-REX). Our findings indicate that the LLM-generated rewards enable quicker convergence and are as effective as those crafted through conventional methods, demonstrating the potential of LLMs to revolutionize reward shaping in RL. Furthermore, we transferred the robot door-opening task from the real-world simulation environment to a real robot, achieving sim-to-real.This approach allows for the rapid deployment of robotic systems, making sophisticated robotics technology more accessible and feasible for a wider range of applications. This study underscores the transformative impact of integrating advanced language models into the realm of robotics and RL, opening up new avenues for future research and application.
AB - In this paper, we examines the integration of LLMs in designing reward functions for reinforcement learning (RL) to enhance robotic applications with minimal human input. In RL, the reward function is pivotal, guiding the agent’s learning trajectory by evaluating the desirability of behaviors within specific environments. Traditional reward functions, often sparse, lead to slow convergence as agents require extensive interactions to learn effectively. By leveraging LLM’s ability to generate code from task semantics, we propose a new method that reduces the complexity of reward design, allowing even non-experts to create effective reward policies using semantic prompts. We utilize the Soft Actor-Critic (SAC) algorithm, known for its efficiency and stability, to train agents under these conditions. To validate the efficacy of our method, we compare it with traditional techniques like Trajectory-ranked reward extrapolation (T-REX). Our findings indicate that the LLM-generated rewards enable quicker convergence and are as effective as those crafted through conventional methods, demonstrating the potential of LLMs to revolutionize reward shaping in RL. Furthermore, we transferred the robot door-opening task from the real-world simulation environment to a real robot, achieving sim-to-real.This approach allows for the rapid deployment of robotic systems, making sophisticated robotics technology more accessible and feasible for a wider range of applications. This study underscores the transformative impact of integrating advanced language models into the realm of robotics and RL, opening up new avenues for future research and application.
KW - Large language model(LLMs)
KW - reinforcement learning
KW - reward shaping
UR - http://www.scopus.com/inward/record.url?scp=85218453102&partnerID=8YFLogxK
U2 - 10.1007/978-981-96-0783-9_1
DO - 10.1007/978-981-96-0783-9_1
M3 - 会议稿件
AN - SCOPUS:85218453102
SN - 9789819607822
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 3
EP - 17
BT - Intelligent Robotics and Applications - 17th International Conference, ICIRA 2024, Proceedings
A2 - Lan, Xuguang
A2 - Mei, Xuesong
A2 - Jiang, Caigui
A2 - Zhao, Fei
A2 - Tian, Zhiqiang
PB - Springer Science and Business Media Deutschland GmbH
T2 - 17th International Conference on Intelligent Robotics and Applications, ICIRA 2024
Y2 - 31 July 2024 through 2 August 2024
ER -