Towards a Unified Network for Robust Monocular Depth Estimation: Network Architecture, Training Strategy and Dataset

Mochu Xiang, Yuchao Dai, Feiyu Zhang, Jiawei Shi, Xinyu Tian, Zhensong Zhang

科研成果: 期刊稿件文章同行评审

2 引用 (Scopus)

摘要

Robust monocular depth estimation (MDE) aims at learning a unified model that works across diverse real-world scenes, which is an important and active topic in computer vision. In this paper, we present Megatron_RVC, our winning solution for the monocular depth challenge in the Robust Vision Challenge (RVC) 2022, where we tackle the challenging problem from three perspectives: network architecture, training strategy and dataset. In particular, we made three contributions towards robust MDE: (1) we built a neural network with high capacity to enable flexible and accurate monocular depth predictions, which contains dedicated components to provide content-aware embeddings and to improve the richness of the details; (2) we proposed a novel mixing training strategy to handle real-world images with different aspect ratios, resolutions and apply tailored loss functions based on the properties of their depth maps; (3) to train a unified network model that covers diverse real-world scenes, we used over 1 million images from different datasets. As of 3rd October 2022, our unified model ranked consistently first across three benchmarks (KITTI, MPI Sintel, and VIPER) among all participants.

源语言英语
页(从-至)1012-1028
页数17
期刊International Journal of Computer Vision
132
4
DOI
出版状态已出版 - 4月 2024

指纹

探究 'Towards a Unified Network for Robust Monocular Depth Estimation: Network Architecture, Training Strategy and Dataset' 的科研主题。它们共同构成独一无二的指纹。

引用此