Kinematic-aware Prompting for Generalizable Articulated Object Manipulation with LLMs

Wenke Xia; Dong Wang; Xincheng Pang; Zhigang Wang; Bin Zhao; Di Hu; Xuelong Li

doi:10.1109/ICRA57147.2024.10610744

Kinematic-aware Prompting for Generalizable Articulated Object Manipulation with LLMs

Wenke Xia, Dong Wang, Xincheng Pang, Zhigang Wang, Bin Zhao, Di Hu, Xuelong Li

School of Artificial Intelligence, OPtics and Electronics

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

5 Scopus citations

Abstract

Generalizable articulated object manipulation is essential for home-assistant robots. Recent efforts focus on imitation learning from demonstrations or reinforcement learning in simulation, however, due to the prohibitive costs of real-world data collection and precise object simulation, it still remains challenging for these works to achieve broad adaptability across diverse articulated objects. Recently, many works have tried to utilize the strong in-context learning ability of Large Language Models (LLMs) to achieve generalizable robotic manipulation, but most of these researches focus on high-level task planning, sidelining low-level robotic control. In this work, building on the idea that the kinematic structure of the object determines how we can manipulate it, we propose a kinematic-aware prompting framework that prompts LLMs with kinematic knowledge of objects to generate low-level motion trajectory waypoints, supporting various object manipulation. To effectively prompt LLMs with the kinematic structure of different objects, we design a unified kinematic knowledge parser, which represents various articulated objects as a unified textual description containing kinematic joints and contact location. Building upon this unified description, a kinematic-aware planner model is proposed to generate precise 3D manipulation waypoints via a designed kinematic-aware chain-of-thoughts prompting method. Our evaluation spanned 48 instances across 16 distinct categories, revealing that our framework not only outperforms traditional methods on 8 seen categories but also shows a powerful zero-shot capability for 8 unseen articulated object categories with only 17 demonstrations. Moreover, the real-world experiments on 7 different object categories prove our framework's adaptability in practical scenarios. Code is released at https://github.com/GeWu-Lab/LLM-articulated-object-manipulation.

Original language	English
Title of host publication	2024 IEEE International Conference on Robotics and Automation, ICRA 2024
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	2073-2080
Number of pages	8
ISBN (Electronic)	9798350384574
DOIs	https://doi.org/10.1109/ICRA57147.2024.10610744
State	Published - 2024
Event	2024 IEEE International Conference on Robotics and Automation, ICRA 2024 - Yokohama, Japan Duration: 13 May 2024 → 17 May 2024

Publication series

Name	Proceedings - IEEE International Conference on Robotics and Automation
ISSN (Print)	1050-4729

Conference

Conference	2024 IEEE International Conference on Robotics and Automation, ICRA 2024
Country/Territory	Japan
City	Yokohama
Period	13/05/24 → 17/05/24

Access to Document

10.1109/ICRA57147.2024.10610744

Cite this

Xia, W., Wang, D., Pang, X., Wang, Z., Zhao, B., Hu, D., & Li, X. (2024). Kinematic-aware Prompting for Generalizable Articulated Object Manipulation with LLMs. In 2024 IEEE International Conference on Robotics and Automation, ICRA 2024 (pp. 2073-2080). (Proceedings - IEEE International Conference on Robotics and Automation). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICRA57147.2024.10610744

@inproceedings{9c558e61ccf2491a979a9f019e88a6ec,

title = "Kinematic-aware Prompting for Generalizable Articulated Object Manipulation with LLMs",

abstract = "Generalizable articulated object manipulation is essential for home-assistant robots. Recent efforts focus on imitation learning from demonstrations or reinforcement learning in simulation, however, due to the prohibitive costs of real-world data collection and precise object simulation, it still remains challenging for these works to achieve broad adaptability across diverse articulated objects. Recently, many works have tried to utilize the strong in-context learning ability of Large Language Models (LLMs) to achieve generalizable robotic manipulation, but most of these researches focus on high-level task planning, sidelining low-level robotic control. In this work, building on the idea that the kinematic structure of the object determines how we can manipulate it, we propose a kinematic-aware prompting framework that prompts LLMs with kinematic knowledge of objects to generate low-level motion trajectory waypoints, supporting various object manipulation. To effectively prompt LLMs with the kinematic structure of different objects, we design a unified kinematic knowledge parser, which represents various articulated objects as a unified textual description containing kinematic joints and contact location. Building upon this unified description, a kinematic-aware planner model is proposed to generate precise 3D manipulation waypoints via a designed kinematic-aware chain-of-thoughts prompting method. Our evaluation spanned 48 instances across 16 distinct categories, revealing that our framework not only outperforms traditional methods on 8 seen categories but also shows a powerful zero-shot capability for 8 unseen articulated object categories with only 17 demonstrations. Moreover, the real-world experiments on 7 different object categories prove our framework's adaptability in practical scenarios. Code is released at https://github.com/GeWu-Lab/LLM-articulated-object-manipulation.",

author = "Wenke Xia and Dong Wang and Xincheng Pang and Zhigang Wang and Bin Zhao and Di Hu and Xuelong Li",

note = "Publisher Copyright: {\textcopyright} 2024 IEEE.; 2024 IEEE International Conference on Robotics and Automation, ICRA 2024 ; Conference date: 13-05-2024 Through 17-05-2024",

year = "2024",

doi = "10.1109/ICRA57147.2024.10610744",

language = "英语",

series = "Proceedings - IEEE International Conference on Robotics and Automation",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "2073--2080",

booktitle = "2024 IEEE International Conference on Robotics and Automation, ICRA 2024",

}

Xia, W, Wang, D, Pang, X, Wang, Z, Zhao, B, Hu, D & Li, X 2024, Kinematic-aware Prompting for Generalizable Articulated Object Manipulation with LLMs. in 2024 IEEE International Conference on Robotics and Automation, ICRA 2024. Proceedings - IEEE International Conference on Robotics and Automation, Institute of Electrical and Electronics Engineers Inc., pp. 2073-2080, 2024 IEEE International Conference on Robotics and Automation, ICRA 2024, Yokohama, Japan, 13/05/24. https://doi.org/10.1109/ICRA57147.2024.10610744

Kinematic-aware Prompting for Generalizable Articulated Object Manipulation with LLMs. / Xia, Wenke; Wang, Dong; Pang, Xincheng et al.
2024 IEEE International Conference on Robotics and Automation, ICRA 2024. Institute of Electrical and Electronics Engineers Inc., 2024. p. 2073-2080 (Proceedings - IEEE International Conference on Robotics and Automation).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Kinematic-aware Prompting for Generalizable Articulated Object Manipulation with LLMs

AU - Xia, Wenke

AU - Wang, Dong

AU - Pang, Xincheng

AU - Wang, Zhigang

AU - Zhao, Bin

AU - Hu, Di

AU - Li, Xuelong

PY - 2024

Y1 - 2024

N2 - Generalizable articulated object manipulation is essential for home-assistant robots. Recent efforts focus on imitation learning from demonstrations or reinforcement learning in simulation, however, due to the prohibitive costs of real-world data collection and precise object simulation, it still remains challenging for these works to achieve broad adaptability across diverse articulated objects. Recently, many works have tried to utilize the strong in-context learning ability of Large Language Models (LLMs) to achieve generalizable robotic manipulation, but most of these researches focus on high-level task planning, sidelining low-level robotic control. In this work, building on the idea that the kinematic structure of the object determines how we can manipulate it, we propose a kinematic-aware prompting framework that prompts LLMs with kinematic knowledge of objects to generate low-level motion trajectory waypoints, supporting various object manipulation. To effectively prompt LLMs with the kinematic structure of different objects, we design a unified kinematic knowledge parser, which represents various articulated objects as a unified textual description containing kinematic joints and contact location. Building upon this unified description, a kinematic-aware planner model is proposed to generate precise 3D manipulation waypoints via a designed kinematic-aware chain-of-thoughts prompting method. Our evaluation spanned 48 instances across 16 distinct categories, revealing that our framework not only outperforms traditional methods on 8 seen categories but also shows a powerful zero-shot capability for 8 unseen articulated object categories with only 17 demonstrations. Moreover, the real-world experiments on 7 different object categories prove our framework's adaptability in practical scenarios. Code is released at https://github.com/GeWu-Lab/LLM-articulated-object-manipulation.

AB - Generalizable articulated object manipulation is essential for home-assistant robots. Recent efforts focus on imitation learning from demonstrations or reinforcement learning in simulation, however, due to the prohibitive costs of real-world data collection and precise object simulation, it still remains challenging for these works to achieve broad adaptability across diverse articulated objects. Recently, many works have tried to utilize the strong in-context learning ability of Large Language Models (LLMs) to achieve generalizable robotic manipulation, but most of these researches focus on high-level task planning, sidelining low-level robotic control. In this work, building on the idea that the kinematic structure of the object determines how we can manipulate it, we propose a kinematic-aware prompting framework that prompts LLMs with kinematic knowledge of objects to generate low-level motion trajectory waypoints, supporting various object manipulation. To effectively prompt LLMs with the kinematic structure of different objects, we design a unified kinematic knowledge parser, which represents various articulated objects as a unified textual description containing kinematic joints and contact location. Building upon this unified description, a kinematic-aware planner model is proposed to generate precise 3D manipulation waypoints via a designed kinematic-aware chain-of-thoughts prompting method. Our evaluation spanned 48 instances across 16 distinct categories, revealing that our framework not only outperforms traditional methods on 8 seen categories but also shows a powerful zero-shot capability for 8 unseen articulated object categories with only 17 demonstrations. Moreover, the real-world experiments on 7 different object categories prove our framework's adaptability in practical scenarios. Code is released at https://github.com/GeWu-Lab/LLM-articulated-object-manipulation.

UR - http://www.scopus.com/inward/record.url?scp=85201307530&partnerID=8YFLogxK

U2 - 10.1109/ICRA57147.2024.10610744

DO - 10.1109/ICRA57147.2024.10610744

M3 - 会议稿件

AN - SCOPUS:85201307530

T3 - Proceedings - IEEE International Conference on Robotics and Automation

SP - 2073

EP - 2080

BT - 2024 IEEE International Conference on Robotics and Automation, ICRA 2024

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2024 IEEE International Conference on Robotics and Automation, ICRA 2024

Y2 - 13 May 2024 through 17 May 2024

ER -

Xia W, Wang D, Pang X, Wang Z, Zhao B, Hu D et al. Kinematic-aware Prompting for Generalizable Articulated Object Manipulation with LLMs. In 2024 IEEE International Conference on Robotics and Automation, ICRA 2024. Institute of Electrical and Electronics Engineers Inc. 2024. p. 2073-2080. (Proceedings - IEEE International Conference on Robotics and Automation). doi: 10.1109/ICRA57147.2024.10610744

Kinematic-aware Prompting for Generalizable Articulated Object Manipulation with LLMs

Abstract

Publication series

Conference

Access to Document

Other files and links

Fingerprint

Cite this