Swin-Depth: Using Transformers and Multi-Scale Fusion for Monocular-Based Depth Estimation

Zeyu Cheng, Yi Zhang, Chengkai Tang

科研成果: 期刊稿件文章同行评审

38 引用 (Scopus)

摘要

Depth estimation using monocular sensors is an important and basic task in computer vision. It has a wide range of applications in robot navigation, autonomous driving, etc., and has received extensive attention from researchers in recent years. For a long time before, monocular depth estimation was based on convolutional neural networks, but its inherent convolution operation showed limitations in modeling large-scale dependence. Using Transformers instead of convolutional neural networks to perform monocular depth estimation provides a good idea, but there is a problem that the calculation complexity is too high and the number of parameters is too large. In response to these problems, we proposed Swin-Depth, which is a Transformer-based monocular depth estimation method that uses hierarchical representation learning with linear complexity for images. In addition, there is an attention module based on multi-scale fusion in Swin-Depth to strengthen the network's ability to capture global information. Our proposed method effectively reduces the excessive parameters in the monocular depth estimation using transformer, and a large number of research experiments show that Swin-Depth has achieved state-of-the-art in challenging datasets of indoor and outdoor scenes.

源语言英语
页(从-至)26912-26920
页数9
期刊IEEE Sensors Journal
21
23
DOI
出版状态已出版 - 1 12月 2021

指纹

探究 'Swin-Depth: Using Transformers and Multi-Scale Fusion for Monocular-Based Depth Estimation' 的科研主题。它们共同构成独一无二的指纹。

引用此