Towards a Unified Network for Robust Monocular Depth Estimation: Network Architecture, Training Strategy and Dataset

Mochu Xiang; Yuchao Dai; Feiyu Zhang; Jiawei Shi; Xinyu Tian; Zhensong Zhang

doi:10.1007/s11263-023-01915-6

Towards a Unified Network for Robust Monocular Depth Estimation: Network Architecture, Training Strategy and Dataset

Mochu Xiang, Yuchao Dai, Feiyu Zhang, Jiawei Shi, Xinyu Tian, Zhensong Zhang

电子信息学院

科研成果: 期刊稿件 › 文章 › 同行评审

2 引用（Scopus）

摘要

Robust monocular depth estimation (MDE) aims at learning a unified model that works across diverse real-world scenes, which is an important and active topic in computer vision. In this paper, we present Megatron_RVC, our winning solution for the monocular depth challenge in the Robust Vision Challenge (RVC) 2022, where we tackle the challenging problem from three perspectives: network architecture, training strategy and dataset. In particular, we made three contributions towards robust MDE: (1) we built a neural network with high capacity to enable flexible and accurate monocular depth predictions, which contains dedicated components to provide content-aware embeddings and to improve the richness of the details; (2) we proposed a novel mixing training strategy to handle real-world images with different aspect ratios, resolutions and apply tailored loss functions based on the properties of their depth maps; (3) to train a unified network model that covers diverse real-world scenes, we used over 1 million images from different datasets. As of 3rd October 2022, our unified model ranked consistently first across three benchmarks (KITTI, MPI Sintel, and VIPER) among all participants.

源语言	英语
页（从-至）	1012-1028
页数	17
期刊	International Journal of Computer Vision
卷	132
期	4
DOI	https://doi.org/10.1007/s11263-023-01915-6
出版状态	已出版 - 4月 2024

访问文件

10.1007/s11263-023-01915-6

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{8b2d9de334b24c57bbb0ff05b7bba32e,

title = "Towards a Unified Network for Robust Monocular Depth Estimation: Network Architecture, Training Strategy and Dataset",

abstract = "Robust monocular depth estimation (MDE) aims at learning a unified model that works across diverse real-world scenes, which is an important and active topic in computer vision. In this paper, we present Megatron_RVC, our winning solution for the monocular depth challenge in the Robust Vision Challenge (RVC) 2022, where we tackle the challenging problem from three perspectives: network architecture, training strategy and dataset. In particular, we made three contributions towards robust MDE: (1) we built a neural network with high capacity to enable flexible and accurate monocular depth predictions, which contains dedicated components to provide content-aware embeddings and to improve the richness of the details; (2) we proposed a novel mixing training strategy to handle real-world images with different aspect ratios, resolutions and apply tailored loss functions based on the properties of their depth maps; (3) to train a unified network model that covers diverse real-world scenes, we used over 1 million images from different datasets. As of 3rd October 2022, our unified model ranked consistently first across three benchmarks (KITTI, MPI Sintel, and VIPER) among all participants.",

keywords = "Monocular depth estimation, Multi-dataset training, Robust, Unified network",

author = "Mochu Xiang and Yuchao Dai and Feiyu Zhang and Jiawei Shi and Xinyu Tian and Zhensong Zhang",

note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023.",

year = "2024",

month = apr,

doi = "10.1007/s11263-023-01915-6",

language = "英语",

volume = "132",

pages = "1012--1028",

journal = "International Journal of Computer Vision",

issn = "0920-5691",

publisher = "Springer Netherlands",

number = "4",

}

TY - JOUR

T1 - Towards a Unified Network for Robust Monocular Depth Estimation

T2 - Network Architecture, Training Strategy and Dataset

AU - Xiang, Mochu

AU - Dai, Yuchao

AU - Zhang, Feiyu

AU - Shi, Jiawei

AU - Tian, Xinyu

AU - Zhang, Zhensong

N1 - Publisher Copyright: © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023.

PY - 2024/4

Y1 - 2024/4

N2 - Robust monocular depth estimation (MDE) aims at learning a unified model that works across diverse real-world scenes, which is an important and active topic in computer vision. In this paper, we present Megatron_RVC, our winning solution for the monocular depth challenge in the Robust Vision Challenge (RVC) 2022, where we tackle the challenging problem from three perspectives: network architecture, training strategy and dataset. In particular, we made three contributions towards robust MDE: (1) we built a neural network with high capacity to enable flexible and accurate monocular depth predictions, which contains dedicated components to provide content-aware embeddings and to improve the richness of the details; (2) we proposed a novel mixing training strategy to handle real-world images with different aspect ratios, resolutions and apply tailored loss functions based on the properties of their depth maps; (3) to train a unified network model that covers diverse real-world scenes, we used over 1 million images from different datasets. As of 3rd October 2022, our unified model ranked consistently first across three benchmarks (KITTI, MPI Sintel, and VIPER) among all participants.

AB - Robust monocular depth estimation (MDE) aims at learning a unified model that works across diverse real-world scenes, which is an important and active topic in computer vision. In this paper, we present Megatron_RVC, our winning solution for the monocular depth challenge in the Robust Vision Challenge (RVC) 2022, where we tackle the challenging problem from three perspectives: network architecture, training strategy and dataset. In particular, we made three contributions towards robust MDE: (1) we built a neural network with high capacity to enable flexible and accurate monocular depth predictions, which contains dedicated components to provide content-aware embeddings and to improve the richness of the details; (2) we proposed a novel mixing training strategy to handle real-world images with different aspect ratios, resolutions and apply tailored loss functions based on the properties of their depth maps; (3) to train a unified network model that covers diverse real-world scenes, we used over 1 million images from different datasets. As of 3rd October 2022, our unified model ranked consistently first across three benchmarks (KITTI, MPI Sintel, and VIPER) among all participants.

KW - Monocular depth estimation

KW - Multi-dataset training

KW - Robust

KW - Unified network

UR - http://www.scopus.com/inward/record.url?scp=85174602079&partnerID=8YFLogxK

U2 - 10.1007/s11263-023-01915-6

DO - 10.1007/s11263-023-01915-6

M3 - 文章

AN - SCOPUS:85174602079

SN - 0920-5691

VL - 132

SP - 1012

EP - 1028

JO - International Journal of Computer Vision

JF - International Journal of Computer Vision

IS - 4

ER -

Towards a Unified Network for Robust Monocular Depth Estimation: Network Architecture, Training Strategy and Dataset

摘要

访问文件

其它文件与链接

指纹

引用此