Depth Helps: Improving Pre-trained RGB-based Policy with Depth Information Injection

Xincheng Pang; Wenke Xia; Zhigang Wang; Bin Zhao; Di Hu; Dong Wang; Xuelong Li

doi:10.1109/IROS58592.2024.10802706

Depth Helps: Improving Pre-trained RGB-based Policy with Depth Information Injection

Xincheng Pang, Wenke Xia, Zhigang Wang, Bin Zhao, Di Hu, Dong Wang, Xuelong Li

光电与智能研究院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

摘要

3D perception ability is crucial for generalizable robotic manipulation. While recent foundation models have made significant strides in perception and decision-making with RGB-based input, their lack of 3D perception limits their effectiveness in fine-grained robotic manipulation tasks. To address these limitations, we propose a Depth Information Injection (DI²) framework that leverages the RGB-Depth modality for policy fine-tuning, while relying solely on RGB images for robust and efficient deployment. Concretely, we introduce the Depth Completion Module (DCM) to extract the spatial prior knowledge related to depth information and generate virtual depth information from RGB inputs to aid policy deployment. Further, we propose the Depth-Aware Codebook (DAC) to eliminate noise and reduce the cumulative error from the depth prediction. In the inference phase, this framework employs RGB inputs and accurately predicted depth data to generate the manipulation action. We conduct experiments on simulated LIBERO environments and real-world scenarios, and the experiment results prove that our method could effectively enhance the pre-trained RGB-based policy with 3D perception ability for robotic manipulation. The website is released at https://gewu-lab.github.io/DepthHelps-IROS2024.

源语言	英语
主期刊名	2024 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2024
出版商	Institute of Electrical and Electronics Engineers Inc.
页	7251-7256
页数	6
ISBN（电子版）	9798350377705
DOI	https://doi.org/10.1109/IROS58592.2024.10802706
出版状态	已出版 - 2024
活动	2024 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2024 - Abu Dhabi, 阿拉伯联合酋长国期限: 14 10月 2024 → 18 10月 2024

出版系列

姓名	IEEE International Conference on Intelligent Robots and Systems
ISSN（印刷版）	2153-0858
ISSN（电子版）	2153-0866

会议

会议	2024 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2024
国家/地区	阿拉伯联合酋长国
市	Abu Dhabi
时期	14/10/24 → 18/10/24

访问文件

10.1109/IROS58592.2024.10802706

其它文件与链接

链接到 Scopus 的出版物

引用此

Pang, X., Xia, W., Wang, Z., Zhao, B., Hu, D., Wang, D., & Li, X. (2024). Depth Helps: Improving Pre-trained RGB-based Policy with Depth Information Injection. 在 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2024 (页码 7251-7256). (IEEE International Conference on Intelligent Robots and Systems). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IROS58592.2024.10802706

@inproceedings{32e022a42b0c4939bcd8fbb7cec1a097,

title = "Depth Helps: Improving Pre-trained RGB-based Policy with Depth Information Injection",

abstract = "3D perception ability is crucial for generalizable robotic manipulation. While recent foundation models have made significant strides in perception and decision-making with RGB-based input, their lack of 3D perception limits their effectiveness in fine-grained robotic manipulation tasks. To address these limitations, we propose a Depth Information Injection (DI2) framework that leverages the RGB-Depth modality for policy fine-tuning, while relying solely on RGB images for robust and efficient deployment. Concretely, we introduce the Depth Completion Module (DCM) to extract the spatial prior knowledge related to depth information and generate virtual depth information from RGB inputs to aid policy deployment. Further, we propose the Depth-Aware Codebook (DAC) to eliminate noise and reduce the cumulative error from the depth prediction. In the inference phase, this framework employs RGB inputs and accurately predicted depth data to generate the manipulation action. We conduct experiments on simulated LIBERO environments and real-world scenarios, and the experiment results prove that our method could effectively enhance the pre-trained RGB-based policy with 3D perception ability for robotic manipulation. The website is released at https://gewu-lab.github.io/DepthHelps-IROS2024.",

author = "Xincheng Pang and Wenke Xia and Zhigang Wang and Bin Zhao and Di Hu and Dong Wang and Xuelong Li",

note = "Publisher Copyright: {\textcopyright} 2024 IEEE.; 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2024 ; Conference date: 14-10-2024 Through 18-10-2024",

year = "2024",

doi = "10.1109/IROS58592.2024.10802706",

language = "英语",

series = "IEEE International Conference on Intelligent Robots and Systems",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "7251--7256",

booktitle = "2024 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2024",

}

Pang, X, Xia, W, Wang, Z, Zhao, B, Hu, D, Wang, D & Li, X 2024, Depth Helps: Improving Pre-trained RGB-based Policy with Depth Information Injection. 在 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2024. IEEE International Conference on Intelligent Robots and Systems, Institute of Electrical and Electronics Engineers Inc., 页码 7251-7256, 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2024, Abu Dhabi, 阿拉伯联合酋长国, 14/10/24. https://doi.org/10.1109/IROS58592.2024.10802706

Depth Helps: Improving Pre-trained RGB-based Policy with Depth Information Injection. / Pang, Xincheng; Xia, Wenke; Wang, Zhigang 等.
2024 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2024. Institute of Electrical and Electronics Engineers Inc., 2024. 页码 7251-7256 (IEEE International Conference on Intelligent Robots and Systems).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Depth Helps

T2 - 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2024

AU - Pang, Xincheng

AU - Xia, Wenke

AU - Wang, Zhigang

AU - Zhao, Bin

AU - Hu, Di

AU - Wang, Dong

AU - Li, Xuelong

PY - 2024

Y1 - 2024

N2 - 3D perception ability is crucial for generalizable robotic manipulation. While recent foundation models have made significant strides in perception and decision-making with RGB-based input, their lack of 3D perception limits their effectiveness in fine-grained robotic manipulation tasks. To address these limitations, we propose a Depth Information Injection (DI2) framework that leverages the RGB-Depth modality for policy fine-tuning, while relying solely on RGB images for robust and efficient deployment. Concretely, we introduce the Depth Completion Module (DCM) to extract the spatial prior knowledge related to depth information and generate virtual depth information from RGB inputs to aid policy deployment. Further, we propose the Depth-Aware Codebook (DAC) to eliminate noise and reduce the cumulative error from the depth prediction. In the inference phase, this framework employs RGB inputs and accurately predicted depth data to generate the manipulation action. We conduct experiments on simulated LIBERO environments and real-world scenarios, and the experiment results prove that our method could effectively enhance the pre-trained RGB-based policy with 3D perception ability for robotic manipulation. The website is released at https://gewu-lab.github.io/DepthHelps-IROS2024.

AB - 3D perception ability is crucial for generalizable robotic manipulation. While recent foundation models have made significant strides in perception and decision-making with RGB-based input, their lack of 3D perception limits their effectiveness in fine-grained robotic manipulation tasks. To address these limitations, we propose a Depth Information Injection (DI2) framework that leverages the RGB-Depth modality for policy fine-tuning, while relying solely on RGB images for robust and efficient deployment. Concretely, we introduce the Depth Completion Module (DCM) to extract the spatial prior knowledge related to depth information and generate virtual depth information from RGB inputs to aid policy deployment. Further, we propose the Depth-Aware Codebook (DAC) to eliminate noise and reduce the cumulative error from the depth prediction. In the inference phase, this framework employs RGB inputs and accurately predicted depth data to generate the manipulation action. We conduct experiments on simulated LIBERO environments and real-world scenarios, and the experiment results prove that our method could effectively enhance the pre-trained RGB-based policy with 3D perception ability for robotic manipulation. The website is released at https://gewu-lab.github.io/DepthHelps-IROS2024.

UR - http://www.scopus.com/inward/record.url?scp=85216472131&partnerID=8YFLogxK

U2 - 10.1109/IROS58592.2024.10802706

DO - 10.1109/IROS58592.2024.10802706

M3 - 会议稿件

AN - SCOPUS:85216472131

T3 - IEEE International Conference on Intelligent Robots and Systems

SP - 7251

EP - 7256

BT - 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2024

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 14 October 2024 through 18 October 2024

ER -

Pang X, Xia W, Wang Z, Zhao B, Hu D, Wang D 等. Depth Helps: Improving Pre-trained RGB-based Policy with Depth Information Injection. 在 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2024. Institute of Electrical and Electronics Engineers Inc. 2024. 页码 7251-7256. (IEEE International Conference on Intelligent Robots and Systems). doi: 10.1109/IROS58592.2024.10802706

Depth Helps: Improving Pre-trained RGB-based Policy with Depth Information Injection

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此