TY - JOUR
T1 - Edge Devices Friendly Self-Supervised Monocular Depth Estimation via Knowledge Distillation
AU - Gao, Wei
AU - Rao, Di
AU - Yang, Yang
AU - Chen, Jie
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023/12/1
Y1 - 2023/12/1
N2 - Self-supervised monocular depth estimation (MDE) has great potential for deployment in a wide range of applications, including virtual reality, autonomous driving, and robotics. Nevertheless, most previous studies focused on complex architectures to pursue better performance in MDE. In this letter, we aim to develop a lightweight yet highly effective self-supervised MDE model that can deliver competitive performance in edge devices. We introduce a novel MobileViT-based depth (MViTDepth) network that can effectively capture both local features and global information by leveraging the strengths of convolutional neural networks (CNNs) and a vision transformer (ViT). To further compress the proposed MViTDepth model, we employ knowledge distillation, which leads to improved depth estimation performance. Specifically, the self-supervised MDE MonoViT is used as a teacher model to construct the knowledge distillation loss for optimizing a student model. Experimental results on benchmark datasets demonstrate that the proposed MViTDepth significantly outperforms Monodepth2 in terms of parameters and accuracy, thereby indicating its superiority in application to edge devices.
AB - Self-supervised monocular depth estimation (MDE) has great potential for deployment in a wide range of applications, including virtual reality, autonomous driving, and robotics. Nevertheless, most previous studies focused on complex architectures to pursue better performance in MDE. In this letter, we aim to develop a lightweight yet highly effective self-supervised MDE model that can deliver competitive performance in edge devices. We introduce a novel MobileViT-based depth (MViTDepth) network that can effectively capture both local features and global information by leveraging the strengths of convolutional neural networks (CNNs) and a vision transformer (ViT). To further compress the proposed MViTDepth model, we employ knowledge distillation, which leads to improved depth estimation performance. Specifically, the self-supervised MDE MonoViT is used as a teacher model to construct the knowledge distillation loss for optimizing a student model. Experimental results on benchmark datasets demonstrate that the proposed MViTDepth significantly outperforms Monodepth2 in terms of parameters and accuracy, thereby indicating its superiority in application to edge devices.
KW - autonomous vehicle navigation
KW - Deep learning for visual perception
KW - lightweight monocular depth estimation
UR - http://www.scopus.com/inward/record.url?scp=85177241001&partnerID=8YFLogxK
U2 - 10.1109/LRA.2023.3330054
DO - 10.1109/LRA.2023.3330054
M3 - 文章
AN - SCOPUS:85177241001
SN - 2377-3766
VL - 8
SP - 8470
EP - 8477
JO - IEEE Robotics and Automation Letters
JF - IEEE Robotics and Automation Letters
IS - 12
ER -