Abstract
Self-supervised monocular depth estimation (MDE) has great potential for deployment in a wide range of applications, including virtual reality, autonomous driving, and robotics. Nevertheless, most previous studies focused on complex architectures to pursue better performance in MDE. In this letter, we aim to develop a lightweight yet highly effective self-supervised MDE model that can deliver competitive performance in edge devices. We introduce a novel MobileViT-based depth (MViTDepth) network that can effectively capture both local features and global information by leveraging the strengths of convolutional neural networks (CNNs) and a vision transformer (ViT). To further compress the proposed MViTDepth model, we employ knowledge distillation, which leads to improved depth estimation performance. Specifically, the self-supervised MDE MonoViT is used as a teacher model to construct the knowledge distillation loss for optimizing a student model. Experimental results on benchmark datasets demonstrate that the proposed MViTDepth significantly outperforms Monodepth2 in terms of parameters and accuracy, thereby indicating its superiority in application to edge devices.
| Original language | English |
|---|---|
| Pages (from-to) | 8470-8477 |
| Number of pages | 8 |
| Journal | IEEE Robotics and Automation Letters |
| Volume | 8 |
| Issue number | 12 |
| DOIs | |
| State | Published - 1 Dec 2023 |
Keywords
- autonomous vehicle navigation
- Deep learning for visual perception
- lightweight monocular depth estimation
Fingerprint
Dive into the research topics of 'Edge Devices Friendly Self-Supervised Monocular Depth Estimation via Knowledge Distillation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver