TY - GEN
T1 - Depth Helps
T2 - 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2024
AU - Pang, Xincheng
AU - Xia, Wenke
AU - Wang, Zhigang
AU - Zhao, Bin
AU - Hu, Di
AU - Wang, Dong
AU - Li, Xuelong
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - 3D perception ability is crucial for generalizable robotic manipulation. While recent foundation models have made significant strides in perception and decision-making with RGB-based input, their lack of 3D perception limits their effectiveness in fine-grained robotic manipulation tasks. To address these limitations, we propose a Depth Information Injection (DI2) framework that leverages the RGB-Depth modality for policy fine-tuning, while relying solely on RGB images for robust and efficient deployment. Concretely, we introduce the Depth Completion Module (DCM) to extract the spatial prior knowledge related to depth information and generate virtual depth information from RGB inputs to aid policy deployment. Further, we propose the Depth-Aware Codebook (DAC) to eliminate noise and reduce the cumulative error from the depth prediction. In the inference phase, this framework employs RGB inputs and accurately predicted depth data to generate the manipulation action. We conduct experiments on simulated LIBERO environments and real-world scenarios, and the experiment results prove that our method could effectively enhance the pre-trained RGB-based policy with 3D perception ability for robotic manipulation. The website is released at https://gewu-lab.github.io/DepthHelps-IROS2024.
AB - 3D perception ability is crucial for generalizable robotic manipulation. While recent foundation models have made significant strides in perception and decision-making with RGB-based input, their lack of 3D perception limits their effectiveness in fine-grained robotic manipulation tasks. To address these limitations, we propose a Depth Information Injection (DI2) framework that leverages the RGB-Depth modality for policy fine-tuning, while relying solely on RGB images for robust and efficient deployment. Concretely, we introduce the Depth Completion Module (DCM) to extract the spatial prior knowledge related to depth information and generate virtual depth information from RGB inputs to aid policy deployment. Further, we propose the Depth-Aware Codebook (DAC) to eliminate noise and reduce the cumulative error from the depth prediction. In the inference phase, this framework employs RGB inputs and accurately predicted depth data to generate the manipulation action. We conduct experiments on simulated LIBERO environments and real-world scenarios, and the experiment results prove that our method could effectively enhance the pre-trained RGB-based policy with 3D perception ability for robotic manipulation. The website is released at https://gewu-lab.github.io/DepthHelps-IROS2024.
UR - http://www.scopus.com/inward/record.url?scp=85216472131&partnerID=8YFLogxK
U2 - 10.1109/IROS58592.2024.10802706
DO - 10.1109/IROS58592.2024.10802706
M3 - 会议稿件
AN - SCOPUS:85216472131
T3 - IEEE International Conference on Intelligent Robots and Systems
SP - 7251
EP - 7256
BT - 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2024
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 14 October 2024 through 18 October 2024
ER -