Utilizing Large Language Models for Robot Skill Reward Shaping in Reinforcement Learning

Qi Guo; Xing Liu; Jianjiang Hui; Zhengxiong Liu; Panfeng Huang

doi:10.1007/978-981-96-0783-9_1

Utilizing Large Language Models for Robot Skill Reward Shaping in Reinforcement Learning

Qi Guo, Xing Liu, Jianjiang Hui, Zhengxiong Liu, Panfeng Huang

School of Astronautics

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

In this paper, we examines the integration of LLMs in designing reward functions for reinforcement learning (RL) to enhance robotic applications with minimal human input. In RL, the reward function is pivotal, guiding the agent’s learning trajectory by evaluating the desirability of behaviors within specific environments. Traditional reward functions, often sparse, lead to slow convergence as agents require extensive interactions to learn effectively. By leveraging LLM’s ability to generate code from task semantics, we propose a new method that reduces the complexity of reward design, allowing even non-experts to create effective reward policies using semantic prompts. We utilize the Soft Actor-Critic (SAC) algorithm, known for its efficiency and stability, to train agents under these conditions. To validate the efficacy of our method, we compare it with traditional techniques like Trajectory-ranked reward extrapolation (T-REX). Our findings indicate that the LLM-generated rewards enable quicker convergence and are as effective as those crafted through conventional methods, demonstrating the potential of LLMs to revolutionize reward shaping in RL. Furthermore, we transferred the robot door-opening task from the real-world simulation environment to a real robot, achieving sim-to-real.This approach allows for the rapid deployment of robotic systems, making sophisticated robotics technology more accessible and feasible for a wider range of applications. This study underscores the transformative impact of integrating advanced language models into the realm of robotics and RL, opening up new avenues for future research and application.

Original language	English
Title of host publication	Intelligent Robotics and Applications - 17th International Conference, ICIRA 2024, Proceedings
Editors	Xuguang Lan, Xuesong Mei, Caigui Jiang, Fei Zhao, Zhiqiang Tian
Publisher	Springer Science and Business Media Deutschland GmbH
Pages	3-17
Number of pages	15
ISBN (Print)	9789819607822
DOIs	https://doi.org/10.1007/978-981-96-0783-9_1
State	Published - 2025
Event	17th International Conference on Intelligent Robotics and Applications, ICIRA 2024 - Xi'an, China Duration: 31 Jul 2024 → 2 Aug 2024

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	15208 LNAI
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	17th International Conference on Intelligent Robotics and Applications, ICIRA 2024
Country/Territory	China
City	Xi'an
Period	31/07/24 → 2/08/24

Keywords

Large language model(LLMs)
reinforcement learning
reward shaping

Access to Document

10.1007/978-981-96-0783-9_1

Cite this

Guo, Q., Liu, X., Hui, J., Liu, Z., & Huang, P. (2025). Utilizing Large Language Models for Robot Skill Reward Shaping in Reinforcement Learning. In X. Lan, X. Mei, C. Jiang, F. Zhao, & Z. Tian (Eds.), Intelligent Robotics and Applications - 17th International Conference, ICIRA 2024, Proceedings (pp. 3-17). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 15208 LNAI). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-96-0783-9_1

Guo, Qi ; Liu, Xing ; Hui, Jianjiang et al. / Utilizing Large Language Models for Robot Skill Reward Shaping in Reinforcement Learning. Intelligent Robotics and Applications - 17th International Conference, ICIRA 2024, Proceedings. editor / Xuguang Lan ; Xuesong Mei ; Caigui Jiang ; Fei Zhao ; Zhiqiang Tian. Springer Science and Business Media Deutschland GmbH, 2025. pp. 3-17 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{13a3aafba1854a8f98b4facc61aeb578,

title = "Utilizing Large Language Models for Robot Skill Reward Shaping in Reinforcement Learning",

abstract = "In this paper, we examines the integration of LLMs in designing reward functions for reinforcement learning (RL) to enhance robotic applications with minimal human input. In RL, the reward function is pivotal, guiding the agent{\textquoteright}s learning trajectory by evaluating the desirability of behaviors within specific environments. Traditional reward functions, often sparse, lead to slow convergence as agents require extensive interactions to learn effectively. By leveraging LLM{\textquoteright}s ability to generate code from task semantics, we propose a new method that reduces the complexity of reward design, allowing even non-experts to create effective reward policies using semantic prompts. We utilize the Soft Actor-Critic (SAC) algorithm, known for its efficiency and stability, to train agents under these conditions. To validate the efficacy of our method, we compare it with traditional techniques like Trajectory-ranked reward extrapolation (T-REX). Our findings indicate that the LLM-generated rewards enable quicker convergence and are as effective as those crafted through conventional methods, demonstrating the potential of LLMs to revolutionize reward shaping in RL. Furthermore, we transferred the robot door-opening task from the real-world simulation environment to a real robot, achieving sim-to-real.This approach allows for the rapid deployment of robotic systems, making sophisticated robotics technology more accessible and feasible for a wider range of applications. This study underscores the transformative impact of integrating advanced language models into the realm of robotics and RL, opening up new avenues for future research and application.",

keywords = "Large language model(LLMs), reinforcement learning, reward shaping",

author = "Qi Guo and Xing Liu and Jianjiang Hui and Zhengxiong Liu and Panfeng Huang",

note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.; 17th International Conference on Intelligent Robotics and Applications, ICIRA 2024 ; Conference date: 31-07-2024 Through 02-08-2024",

year = "2025",

doi = "10.1007/978-981-96-0783-9_1",

language = "英语",

isbn = "9789819607822",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "3--17",

editor = "Xuguang Lan and Xuesong Mei and Caigui Jiang and Fei Zhao and Zhiqiang Tian",

booktitle = "Intelligent Robotics and Applications - 17th International Conference, ICIRA 2024, Proceedings",

}

Guo, Q, Liu, X, Hui, J, Liu, Z & Huang, P 2025, Utilizing Large Language Models for Robot Skill Reward Shaping in Reinforcement Learning. in X Lan, X Mei, C Jiang, F Zhao & Z Tian (eds), Intelligent Robotics and Applications - 17th International Conference, ICIRA 2024, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 15208 LNAI, Springer Science and Business Media Deutschland GmbH, pp. 3-17, 17th International Conference on Intelligent Robotics and Applications, ICIRA 2024, Xi'an, China, 31/07/24. https://doi.org/10.1007/978-981-96-0783-9_1

Utilizing Large Language Models for Robot Skill Reward Shaping in Reinforcement Learning. / Guo, Qi; Liu, Xing; Hui, Jianjiang et al.
Intelligent Robotics and Applications - 17th International Conference, ICIRA 2024, Proceedings. ed. / Xuguang Lan; Xuesong Mei; Caigui Jiang; Fei Zhao; Zhiqiang Tian. Springer Science and Business Media Deutschland GmbH, 2025. p. 3-17 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 15208 LNAI).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Utilizing Large Language Models for Robot Skill Reward Shaping in Reinforcement Learning

AU - Guo, Qi

AU - Liu, Xing

AU - Hui, Jianjiang

AU - Liu, Zhengxiong

AU - Huang, Panfeng

N1 - Publisher Copyright: © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.

PY - 2025

Y1 - 2025

N2 - In this paper, we examines the integration of LLMs in designing reward functions for reinforcement learning (RL) to enhance robotic applications with minimal human input. In RL, the reward function is pivotal, guiding the agent’s learning trajectory by evaluating the desirability of behaviors within specific environments. Traditional reward functions, often sparse, lead to slow convergence as agents require extensive interactions to learn effectively. By leveraging LLM’s ability to generate code from task semantics, we propose a new method that reduces the complexity of reward design, allowing even non-experts to create effective reward policies using semantic prompts. We utilize the Soft Actor-Critic (SAC) algorithm, known for its efficiency and stability, to train agents under these conditions. To validate the efficacy of our method, we compare it with traditional techniques like Trajectory-ranked reward extrapolation (T-REX). Our findings indicate that the LLM-generated rewards enable quicker convergence and are as effective as those crafted through conventional methods, demonstrating the potential of LLMs to revolutionize reward shaping in RL. Furthermore, we transferred the robot door-opening task from the real-world simulation environment to a real robot, achieving sim-to-real.This approach allows for the rapid deployment of robotic systems, making sophisticated robotics technology more accessible and feasible for a wider range of applications. This study underscores the transformative impact of integrating advanced language models into the realm of robotics and RL, opening up new avenues for future research and application.

AB - In this paper, we examines the integration of LLMs in designing reward functions for reinforcement learning (RL) to enhance robotic applications with minimal human input. In RL, the reward function is pivotal, guiding the agent’s learning trajectory by evaluating the desirability of behaviors within specific environments. Traditional reward functions, often sparse, lead to slow convergence as agents require extensive interactions to learn effectively. By leveraging LLM’s ability to generate code from task semantics, we propose a new method that reduces the complexity of reward design, allowing even non-experts to create effective reward policies using semantic prompts. We utilize the Soft Actor-Critic (SAC) algorithm, known for its efficiency and stability, to train agents under these conditions. To validate the efficacy of our method, we compare it with traditional techniques like Trajectory-ranked reward extrapolation (T-REX). Our findings indicate that the LLM-generated rewards enable quicker convergence and are as effective as those crafted through conventional methods, demonstrating the potential of LLMs to revolutionize reward shaping in RL. Furthermore, we transferred the robot door-opening task from the real-world simulation environment to a real robot, achieving sim-to-real.This approach allows for the rapid deployment of robotic systems, making sophisticated robotics technology more accessible and feasible for a wider range of applications. This study underscores the transformative impact of integrating advanced language models into the realm of robotics and RL, opening up new avenues for future research and application.

KW - Large language model(LLMs)

KW - reinforcement learning

KW - reward shaping

UR - http://www.scopus.com/inward/record.url?scp=85218453102&partnerID=8YFLogxK

U2 - 10.1007/978-981-96-0783-9_1

DO - 10.1007/978-981-96-0783-9_1

M3 - 会议稿件

AN - SCOPUS:85218453102

SN - 9789819607822

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 3

EP - 17

BT - Intelligent Robotics and Applications - 17th International Conference, ICIRA 2024, Proceedings

A2 - Lan, Xuguang

A2 - Mei, Xuesong

A2 - Jiang, Caigui

A2 - Zhao, Fei

A2 - Tian, Zhiqiang

PB - Springer Science and Business Media Deutschland GmbH

T2 - 17th International Conference on Intelligent Robotics and Applications, ICIRA 2024

Y2 - 31 July 2024 through 2 August 2024

ER -

Guo Q, Liu X, Hui J, Liu Z, Huang P. Utilizing Large Language Models for Robot Skill Reward Shaping in Reinforcement Learning. In Lan X, Mei X, Jiang C, Zhao F, Tian Z, editors, Intelligent Robotics and Applications - 17th International Conference, ICIRA 2024, Proceedings. Springer Science and Business Media Deutschland GmbH. 2025. p. 3-17. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-981-96-0783-9_1