Improving Depth Completion via Depth Feature Upsampling

Yufei Wang; Ge Zhang; Shaoqian Wang; Bo Li; Qi Liu; Le Hui; Yuchao Dai

doi:10.1109/CVPR52733.2024.01994

Improving Depth Completion via Depth Feature Upsampling

Yufei Wang, Ge Zhang, Shaoqian Wang, Bo Li, Qi Liu, Le Hui, Yuchao Dai

School of Electronics and Information

Research output: Contribution to journal › Conference article › peer-review

3 Scopus citations

Abstract

The encoder-decoder network (ED-Net) is a commonly employed choice for existing depth completion methods, but its working mechanism is ambiguous. In this paper, we vi-sualize the internal feature maps to analyze how the net-work densifies the input sparse depth. We find that the en-coder feature of ED-Net focus on the areas with input depth points around. To obtain a dense feature and thus esti-mate complete depth, the decoder feature tends to comple-ment and enhance the encoder feature by skip-connection to make the fused encoder-decoder feature dense, resulting in the decoder feature also exhibits sparse. However, ED-Net obtains the sparse decoder feature from the dense fused feature at the previous stage, where the 'dense-i-sparse'' process destroys the completeness of features and loses in-formation. To address this issue, we present a depth feature upsampling network (DFU) that explicitly utilizes these dense features to guide the upsampling of a low-resolution (LR) depth feature to a high-resolution (HR) one. The completeness of features is maintained throughout the up-sampling process, thus avoiding information loss. Fur-thermore, we propose a confidence-aware guidance module (CGM), which is confidence-aware and performs guidance with adaptive receptive fields (GARF), to fully exploit the potential of these dense features as guidance. Experimental results show that our DFU, a plug-and-play module, can significantly improve the performance of existing ED-Net based methods with limited computational overheads, and new SOTA results are achieved. Besides, the generalization capability on sparser depth is also enhanced. Project page: https://npucvr.github.iolDFU.

Original language	English
Pages (from-to)	21104-21113
Number of pages	10
Journal	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
DOIs	https://doi.org/10.1109/CVPR52733.2024.01994
State	Published - 2024
Event	2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024 - Seattle, United States Duration: 16 Jun 2024 → 22 Jun 2024

Keywords

Deep Learning
Depth Completion

Access to Document

10.1109/CVPR52733.2024.01994

Cite this

@article{6417f83eb5be459e839e1a7f92811970,

title = "Improving Depth Completion via Depth Feature Upsampling",

abstract = "The encoder-decoder network (ED-Net) is a commonly employed choice for existing depth completion methods, but its working mechanism is ambiguous. In this paper, we vi-sualize the internal feature maps to analyze how the net-work densifies the input sparse depth. We find that the en-coder feature of ED-Net focus on the areas with input depth points around. To obtain a dense feature and thus esti-mate complete depth, the decoder feature tends to comple-ment and enhance the encoder feature by skip-connection to make the fused encoder-decoder feature dense, resulting in the decoder feature also exhibits sparse. However, ED-Net obtains the sparse decoder feature from the dense fused feature at the previous stage, where the 'dense-i-sparse'' process destroys the completeness of features and loses in-formation. To address this issue, we present a depth feature upsampling network (DFU) that explicitly utilizes these dense features to guide the upsampling of a low-resolution (LR) depth feature to a high-resolution (HR) one. The completeness of features is maintained throughout the up-sampling process, thus avoiding information loss. Fur-thermore, we propose a confidence-aware guidance module (CGM), which is confidence-aware and performs guidance with adaptive receptive fields (GARF), to fully exploit the potential of these dense features as guidance. Experimental results show that our DFU, a plug-and-play module, can significantly improve the performance of existing ED-Net based methods with limited computational overheads, and new SOTA results are achieved. Besides, the generalization capability on sparser depth is also enhanced. Project page: https://npucvr.github.iolDFU.",

keywords = "Deep Learning, Depth Completion",

author = "Yufei Wang and Ge Zhang and Shaoqian Wang and Bo Li and Qi Liu and Le Hui and Yuchao Dai",

note = "Publisher Copyright: {\textcopyright} 2024 IEEE.; 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024 ; Conference date: 16-06-2024 Through 22-06-2024",

year = "2024",

doi = "10.1109/CVPR52733.2024.01994",

language = "英语",

pages = "21104--21113",

journal = "Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition",

issn = "1063-6919",

publisher = "IEEE Computer Society",

}

TY - JOUR

T1 - Improving Depth Completion via Depth Feature Upsampling

AU - Wang, Yufei

AU - Zhang, Ge

AU - Wang, Shaoqian

AU - Li, Bo

AU - Liu, Qi

AU - Hui, Le

AU - Dai, Yuchao

PY - 2024

Y1 - 2024

N2 - The encoder-decoder network (ED-Net) is a commonly employed choice for existing depth completion methods, but its working mechanism is ambiguous. In this paper, we vi-sualize the internal feature maps to analyze how the net-work densifies the input sparse depth. We find that the en-coder feature of ED-Net focus on the areas with input depth points around. To obtain a dense feature and thus esti-mate complete depth, the decoder feature tends to comple-ment and enhance the encoder feature by skip-connection to make the fused encoder-decoder feature dense, resulting in the decoder feature also exhibits sparse. However, ED-Net obtains the sparse decoder feature from the dense fused feature at the previous stage, where the 'dense-i-sparse'' process destroys the completeness of features and loses in-formation. To address this issue, we present a depth feature upsampling network (DFU) that explicitly utilizes these dense features to guide the upsampling of a low-resolution (LR) depth feature to a high-resolution (HR) one. The completeness of features is maintained throughout the up-sampling process, thus avoiding information loss. Fur-thermore, we propose a confidence-aware guidance module (CGM), which is confidence-aware and performs guidance with adaptive receptive fields (GARF), to fully exploit the potential of these dense features as guidance. Experimental results show that our DFU, a plug-and-play module, can significantly improve the performance of existing ED-Net based methods with limited computational overheads, and new SOTA results are achieved. Besides, the generalization capability on sparser depth is also enhanced. Project page: https://npucvr.github.iolDFU.

AB - The encoder-decoder network (ED-Net) is a commonly employed choice for existing depth completion methods, but its working mechanism is ambiguous. In this paper, we vi-sualize the internal feature maps to analyze how the net-work densifies the input sparse depth. We find that the en-coder feature of ED-Net focus on the areas with input depth points around. To obtain a dense feature and thus esti-mate complete depth, the decoder feature tends to comple-ment and enhance the encoder feature by skip-connection to make the fused encoder-decoder feature dense, resulting in the decoder feature also exhibits sparse. However, ED-Net obtains the sparse decoder feature from the dense fused feature at the previous stage, where the 'dense-i-sparse'' process destroys the completeness of features and loses in-formation. To address this issue, we present a depth feature upsampling network (DFU) that explicitly utilizes these dense features to guide the upsampling of a low-resolution (LR) depth feature to a high-resolution (HR) one. The completeness of features is maintained throughout the up-sampling process, thus avoiding information loss. Fur-thermore, we propose a confidence-aware guidance module (CGM), which is confidence-aware and performs guidance with adaptive receptive fields (GARF), to fully exploit the potential of these dense features as guidance. Experimental results show that our DFU, a plug-and-play module, can significantly improve the performance of existing ED-Net based methods with limited computational overheads, and new SOTA results are achieved. Besides, the generalization capability on sparser depth is also enhanced. Project page: https://npucvr.github.iolDFU.

KW - Deep Learning

KW - Depth Completion

UR - http://www.scopus.com/inward/record.url?scp=85208384064&partnerID=8YFLogxK

U2 - 10.1109/CVPR52733.2024.01994

DO - 10.1109/CVPR52733.2024.01994

M3 - 会议文章

AN - SCOPUS:85208384064

SN - 1063-6919

SP - 21104

EP - 21113

JO - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

JF - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

T2 - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024

Y2 - 16 June 2024 through 22 June 2024

ER -

Improving Depth Completion via Depth Feature Upsampling

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this