Edge Devices Friendly Self-Supervised Monocular Depth Estimation via Knowledge Distillation

Wei Gao; Di Rao; Yang Yang; Jie Chen

doi:10.1109/LRA.2023.3330054

Edge Devices Friendly Self-Supervised Monocular Depth Estimation via Knowledge Distillation

Wei Gao, Di Rao, Yang Yang, Jie Chen

School of Marine Science and Technology

Jiangsu University

Research output: Contribution to journal › Article › peer-review

10 Scopus citations

Abstract

Self-supervised monocular depth estimation (MDE) has great potential for deployment in a wide range of applications, including virtual reality, autonomous driving, and robotics. Nevertheless, most previous studies focused on complex architectures to pursue better performance in MDE. In this letter, we aim to develop a lightweight yet highly effective self-supervised MDE model that can deliver competitive performance in edge devices. We introduce a novel MobileViT-based depth (MViTDepth) network that can effectively capture both local features and global information by leveraging the strengths of convolutional neural networks (CNNs) and a vision transformer (ViT). To further compress the proposed MViTDepth model, we employ knowledge distillation, which leads to improved depth estimation performance. Specifically, the self-supervised MDE MonoViT is used as a teacher model to construct the knowledge distillation loss for optimizing a student model. Experimental results on benchmark datasets demonstrate that the proposed MViTDepth significantly outperforms Monodepth2 in terms of parameters and accuracy, thereby indicating its superiority in application to edge devices.

Original language	English
Pages (from-to)	8470-8477
Number of pages	8
Journal	IEEE Robotics and Automation Letters
Volume	8
Issue number	12
DOIs	https://doi.org/10.1109/LRA.2023.3330054
State	Published - 1 Dec 2023

Keywords

autonomous vehicle navigation
Deep learning for visual perception
lightweight monocular depth estimation

Access to Document

10.1109/LRA.2023.3330054

Cite this

@article{8af6b73daae84cfcb3e31b071cb6c438,

title = "Edge Devices Friendly Self-Supervised Monocular Depth Estimation via Knowledge Distillation",

abstract = "Self-supervised monocular depth estimation (MDE) has great potential for deployment in a wide range of applications, including virtual reality, autonomous driving, and robotics. Nevertheless, most previous studies focused on complex architectures to pursue better performance in MDE. In this letter, we aim to develop a lightweight yet highly effective self-supervised MDE model that can deliver competitive performance in edge devices. We introduce a novel MobileViT-based depth (MViTDepth) network that can effectively capture both local features and global information by leveraging the strengths of convolutional neural networks (CNNs) and a vision transformer (ViT). To further compress the proposed MViTDepth model, we employ knowledge distillation, which leads to improved depth estimation performance. Specifically, the self-supervised MDE MonoViT is used as a teacher model to construct the knowledge distillation loss for optimizing a student model. Experimental results on benchmark datasets demonstrate that the proposed MViTDepth significantly outperforms Monodepth2 in terms of parameters and accuracy, thereby indicating its superiority in application to edge devices.",

keywords = "autonomous vehicle navigation, Deep learning for visual perception, lightweight monocular depth estimation",

author = "Wei Gao and Di Rao and Yang Yang and Jie Chen",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.",

year = "2023",

month = dec,

day = "1",

doi = "10.1109/LRA.2023.3330054",

language = "英语",

volume = "8",

pages = "8470--8477",

journal = "IEEE Robotics and Automation Letters",

issn = "2377-3766",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "12",

}

TY - JOUR

T1 - Edge Devices Friendly Self-Supervised Monocular Depth Estimation via Knowledge Distillation

AU - Gao, Wei

AU - Rao, Di

AU - Yang, Yang

AU - Chen, Jie

PY - 2023/12/1

Y1 - 2023/12/1

N2 - Self-supervised monocular depth estimation (MDE) has great potential for deployment in a wide range of applications, including virtual reality, autonomous driving, and robotics. Nevertheless, most previous studies focused on complex architectures to pursue better performance in MDE. In this letter, we aim to develop a lightweight yet highly effective self-supervised MDE model that can deliver competitive performance in edge devices. We introduce a novel MobileViT-based depth (MViTDepth) network that can effectively capture both local features and global information by leveraging the strengths of convolutional neural networks (CNNs) and a vision transformer (ViT). To further compress the proposed MViTDepth model, we employ knowledge distillation, which leads to improved depth estimation performance. Specifically, the self-supervised MDE MonoViT is used as a teacher model to construct the knowledge distillation loss for optimizing a student model. Experimental results on benchmark datasets demonstrate that the proposed MViTDepth significantly outperforms Monodepth2 in terms of parameters and accuracy, thereby indicating its superiority in application to edge devices.

AB - Self-supervised monocular depth estimation (MDE) has great potential for deployment in a wide range of applications, including virtual reality, autonomous driving, and robotics. Nevertheless, most previous studies focused on complex architectures to pursue better performance in MDE. In this letter, we aim to develop a lightweight yet highly effective self-supervised MDE model that can deliver competitive performance in edge devices. We introduce a novel MobileViT-based depth (MViTDepth) network that can effectively capture both local features and global information by leveraging the strengths of convolutional neural networks (CNNs) and a vision transformer (ViT). To further compress the proposed MViTDepth model, we employ knowledge distillation, which leads to improved depth estimation performance. Specifically, the self-supervised MDE MonoViT is used as a teacher model to construct the knowledge distillation loss for optimizing a student model. Experimental results on benchmark datasets demonstrate that the proposed MViTDepth significantly outperforms Monodepth2 in terms of parameters and accuracy, thereby indicating its superiority in application to edge devices.

KW - autonomous vehicle navigation

KW - Deep learning for visual perception

KW - lightweight monocular depth estimation

UR - http://www.scopus.com/inward/record.url?scp=85177241001&partnerID=8YFLogxK

U2 - 10.1109/LRA.2023.3330054

DO - 10.1109/LRA.2023.3330054

M3 - 文章

AN - SCOPUS:85177241001

SN - 2377-3766

VL - 8

SP - 8470

EP - 8477

JO - IEEE Robotics and Automation Letters

JF - IEEE Robotics and Automation Letters

IS - 12

ER -

Edge Devices Friendly Self-Supervised Monocular Depth Estimation via Knowledge Distillation

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this