TY - JOUR
T1 - Towards a Unified Network for Robust Monocular Depth Estimation
T2 - Network Architecture, Training Strategy and Dataset
AU - Xiang, Mochu
AU - Dai, Yuchao
AU - Zhang, Feiyu
AU - Shi, Jiawei
AU - Tian, Xinyu
AU - Zhang, Zhensong
N1 - Publisher Copyright:
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023.
PY - 2024/4
Y1 - 2024/4
N2 - Robust monocular depth estimation (MDE) aims at learning a unified model that works across diverse real-world scenes, which is an important and active topic in computer vision. In this paper, we present Megatron_RVC, our winning solution for the monocular depth challenge in the Robust Vision Challenge (RVC) 2022, where we tackle the challenging problem from three perspectives: network architecture, training strategy and dataset. In particular, we made three contributions towards robust MDE: (1) we built a neural network with high capacity to enable flexible and accurate monocular depth predictions, which contains dedicated components to provide content-aware embeddings and to improve the richness of the details; (2) we proposed a novel mixing training strategy to handle real-world images with different aspect ratios, resolutions and apply tailored loss functions based on the properties of their depth maps; (3) to train a unified network model that covers diverse real-world scenes, we used over 1 million images from different datasets. As of 3rd October 2022, our unified model ranked consistently first across three benchmarks (KITTI, MPI Sintel, and VIPER) among all participants.
AB - Robust monocular depth estimation (MDE) aims at learning a unified model that works across diverse real-world scenes, which is an important and active topic in computer vision. In this paper, we present Megatron_RVC, our winning solution for the monocular depth challenge in the Robust Vision Challenge (RVC) 2022, where we tackle the challenging problem from three perspectives: network architecture, training strategy and dataset. In particular, we made three contributions towards robust MDE: (1) we built a neural network with high capacity to enable flexible and accurate monocular depth predictions, which contains dedicated components to provide content-aware embeddings and to improve the richness of the details; (2) we proposed a novel mixing training strategy to handle real-world images with different aspect ratios, resolutions and apply tailored loss functions based on the properties of their depth maps; (3) to train a unified network model that covers diverse real-world scenes, we used over 1 million images from different datasets. As of 3rd October 2022, our unified model ranked consistently first across three benchmarks (KITTI, MPI Sintel, and VIPER) among all participants.
KW - Monocular depth estimation
KW - Multi-dataset training
KW - Robust
KW - Unified network
UR - http://www.scopus.com/inward/record.url?scp=85174602079&partnerID=8YFLogxK
U2 - 10.1007/s11263-023-01915-6
DO - 10.1007/s11263-023-01915-6
M3 - 文章
AN - SCOPUS:85174602079
SN - 0920-5691
VL - 132
SP - 1012
EP - 1028
JO - International Journal of Computer Vision
JF - International Journal of Computer Vision
IS - 4
ER -