Utilizing Large Language Models for Robot Skill Reward Shaping in Reinforcement Learning

Qi Guo, Xing Liu, Jianjiang Hui, Zhengxiong Liu, Panfeng Huang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In this paper, we examines the integration of LLMs in designing reward functions for reinforcement learning (RL) to enhance robotic applications with minimal human input. In RL, the reward function is pivotal, guiding the agent’s learning trajectory by evaluating the desirability of behaviors within specific environments. Traditional reward functions, often sparse, lead to slow convergence as agents require extensive interactions to learn effectively. By leveraging LLM’s ability to generate code from task semantics, we propose a new method that reduces the complexity of reward design, allowing even non-experts to create effective reward policies using semantic prompts. We utilize the Soft Actor-Critic (SAC) algorithm, known for its efficiency and stability, to train agents under these conditions. To validate the efficacy of our method, we compare it with traditional techniques like Trajectory-ranked reward extrapolation (T-REX). Our findings indicate that the LLM-generated rewards enable quicker convergence and are as effective as those crafted through conventional methods, demonstrating the potential of LLMs to revolutionize reward shaping in RL. Furthermore, we transferred the robot door-opening task from the real-world simulation environment to a real robot, achieving sim-to-real.This approach allows for the rapid deployment of robotic systems, making sophisticated robotics technology more accessible and feasible for a wider range of applications. This study underscores the transformative impact of integrating advanced language models into the realm of robotics and RL, opening up new avenues for future research and application.

Original languageEnglish
Title of host publicationIntelligent Robotics and Applications - 17th International Conference, ICIRA 2024, Proceedings
EditorsXuguang Lan, Xuesong Mei, Caigui Jiang, Fei Zhao, Zhiqiang Tian
PublisherSpringer Science and Business Media Deutschland GmbH
Pages3-17
Number of pages15
ISBN (Print)9789819607822
DOIs
StatePublished - 2025
Event17th International Conference on Intelligent Robotics and Applications, ICIRA 2024 - Xi'an, China
Duration: 31 Jul 20242 Aug 2024

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume15208 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference17th International Conference on Intelligent Robotics and Applications, ICIRA 2024
Country/TerritoryChina
CityXi'an
Period31/07/242/08/24

Keywords

  • Large language model(LLMs)
  • reinforcement learning
  • reward shaping

Fingerprint

Dive into the research topics of 'Utilizing Large Language Models for Robot Skill Reward Shaping in Reinforcement Learning'. Together they form a unique fingerprint.

Cite this