Towards a Unified Network for Robust Monocular Depth Estimation: Network Architecture, Training Strategy and Dataset

Mochu Xiang, Yuchao Dai, Feiyu Zhang, Jiawei Shi, Xinyu Tian, Zhensong Zhang

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

Robust monocular depth estimation (MDE) aims at learning a unified model that works across diverse real-world scenes, which is an important and active topic in computer vision. In this paper, we present Megatron_RVC, our winning solution for the monocular depth challenge in the Robust Vision Challenge (RVC) 2022, where we tackle the challenging problem from three perspectives: network architecture, training strategy and dataset. In particular, we made three contributions towards robust MDE: (1) we built a neural network with high capacity to enable flexible and accurate monocular depth predictions, which contains dedicated components to provide content-aware embeddings and to improve the richness of the details; (2) we proposed a novel mixing training strategy to handle real-world images with different aspect ratios, resolutions and apply tailored loss functions based on the properties of their depth maps; (3) to train a unified network model that covers diverse real-world scenes, we used over 1 million images from different datasets. As of 3rd October 2022, our unified model ranked consistently first across three benchmarks (KITTI, MPI Sintel, and VIPER) among all participants.

Original languageEnglish
Pages (from-to)1012-1028
Number of pages17
JournalInternational Journal of Computer Vision
Volume132
Issue number4
DOIs
StatePublished - Apr 2024

Keywords

  • Monocular depth estimation
  • Multi-dataset training
  • Robust
  • Unified network

Fingerprint

Dive into the research topics of 'Towards a Unified Network for Robust Monocular Depth Estimation: Network Architecture, Training Strategy and Dataset'. Together they form a unique fingerprint.

Cite this