Decomposed Guided Dynamic Filters for Efficient RGB-Guided Depth Completion

Yufei Wang; Yuxin Mao; Qi Liu; Yuchao Dai

doi:10.1109/TCSVT.2023.3292398

Decomposed Guided Dynamic Filters for Efficient RGB-Guided Depth Completion

Yufei Wang, Yuxin Mao, Qi Liu, Yuchao Dai

School of Electronics and Information

Northwestern Polytechnical University Xian

Research output: Contribution to journal › Article › peer-review

7 Scopus citations

Abstract

RGB-guided depth completion aims at predicting dense depth maps from sparse depth measurements and corresponding RGB images, where how to effectively and efficiently exploit the multi-modal information is a key issue. Guided dynamic filters, which generate spatially-variant depth-wise separable convolutional filters from RGB features to guide depth features, have been proven to be effective in this task. However, the dynamically generated filters require massive model parameters, computational costs and memory footprints when the number of feature channels is large. In this paper, we propose to decompose the guided dynamic filters into a spatially-shared component multiplied by content-adaptive adaptors at each spatial location. Based on the proposed idea, we introduce two decomposition schemes A and B, which decompose the filters by splitting the filter structure and using spatial-wise attention, respectively. The decomposed filters not only maintain the favorable properties of guided dynamic filters as being content-dependent and spatially-variant, but also reduce model parameters and hardware costs, as the learned adaptors are decoupled with the number of feature channels. Extensive experimental results demonstrate that the methods using our schemes outperform state-of-the-art methods on the KITTI dataset, and rank 1st and 2nd on the KITTI benchmark at the time of submission. Meanwhile, they also achieve comparable performance on the NYUv2 dataset. In addition, our proposed methods are general and could be employed as plug-and-play feature fusion blocks in other multi-modal fusion tasks such as RGB-D salient object detection.

Original language	English
Pages (from-to)	1186-1198
Number of pages	13
Journal	IEEE Transactions on Circuits and Systems for Video Technology
Volume	34
Issue number	2
DOIs	https://doi.org/10.1109/TCSVT.2023.3292398
State	Published - 1 Feb 2024

Keywords

Depth completion
feature fusion
guided dynamic filter
multi-modal
range sensing

Access to Document

10.1109/TCSVT.2023.3292398

Cite this

@article{0671fc9f55114f86ba8f5051c3379213,

title = "Decomposed Guided Dynamic Filters for Efficient RGB-Guided Depth Completion",

abstract = "RGB-guided depth completion aims at predicting dense depth maps from sparse depth measurements and corresponding RGB images, where how to effectively and efficiently exploit the multi-modal information is a key issue. Guided dynamic filters, which generate spatially-variant depth-wise separable convolutional filters from RGB features to guide depth features, have been proven to be effective in this task. However, the dynamically generated filters require massive model parameters, computational costs and memory footprints when the number of feature channels is large. In this paper, we propose to decompose the guided dynamic filters into a spatially-shared component multiplied by content-adaptive adaptors at each spatial location. Based on the proposed idea, we introduce two decomposition schemes A and B, which decompose the filters by splitting the filter structure and using spatial-wise attention, respectively. The decomposed filters not only maintain the favorable properties of guided dynamic filters as being content-dependent and spatially-variant, but also reduce model parameters and hardware costs, as the learned adaptors are decoupled with the number of feature channels. Extensive experimental results demonstrate that the methods using our schemes outperform state-of-the-art methods on the KITTI dataset, and rank 1st and 2nd on the KITTI benchmark at the time of submission. Meanwhile, they also achieve comparable performance on the NYUv2 dataset. In addition, our proposed methods are general and could be employed as plug-and-play feature fusion blocks in other multi-modal fusion tasks such as RGB-D salient object detection.",

keywords = "Depth completion, feature fusion, guided dynamic filter, multi-modal, range sensing",

author = "Yufei Wang and Yuxin Mao and Qi Liu and Yuchao Dai",

note = "Publisher Copyright: {\textcopyright} 1991-2012 IEEE.",

year = "2024",

month = feb,

day = "1",

doi = "10.1109/TCSVT.2023.3292398",

language = "英语",

volume = "34",

pages = "1186--1198",

journal = "IEEE Transactions on Circuits and Systems for Video Technology",

issn = "1051-8215",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "2",

}

TY - JOUR

T1 - Decomposed Guided Dynamic Filters for Efficient RGB-Guided Depth Completion

AU - Wang, Yufei

AU - Mao, Yuxin

AU - Liu, Qi

AU - Dai, Yuchao

PY - 2024/2/1

Y1 - 2024/2/1

N2 - RGB-guided depth completion aims at predicting dense depth maps from sparse depth measurements and corresponding RGB images, where how to effectively and efficiently exploit the multi-modal information is a key issue. Guided dynamic filters, which generate spatially-variant depth-wise separable convolutional filters from RGB features to guide depth features, have been proven to be effective in this task. However, the dynamically generated filters require massive model parameters, computational costs and memory footprints when the number of feature channels is large. In this paper, we propose to decompose the guided dynamic filters into a spatially-shared component multiplied by content-adaptive adaptors at each spatial location. Based on the proposed idea, we introduce two decomposition schemes A and B, which decompose the filters by splitting the filter structure and using spatial-wise attention, respectively. The decomposed filters not only maintain the favorable properties of guided dynamic filters as being content-dependent and spatially-variant, but also reduce model parameters and hardware costs, as the learned adaptors are decoupled with the number of feature channels. Extensive experimental results demonstrate that the methods using our schemes outperform state-of-the-art methods on the KITTI dataset, and rank 1st and 2nd on the KITTI benchmark at the time of submission. Meanwhile, they also achieve comparable performance on the NYUv2 dataset. In addition, our proposed methods are general and could be employed as plug-and-play feature fusion blocks in other multi-modal fusion tasks such as RGB-D salient object detection.

AB - RGB-guided depth completion aims at predicting dense depth maps from sparse depth measurements and corresponding RGB images, where how to effectively and efficiently exploit the multi-modal information is a key issue. Guided dynamic filters, which generate spatially-variant depth-wise separable convolutional filters from RGB features to guide depth features, have been proven to be effective in this task. However, the dynamically generated filters require massive model parameters, computational costs and memory footprints when the number of feature channels is large. In this paper, we propose to decompose the guided dynamic filters into a spatially-shared component multiplied by content-adaptive adaptors at each spatial location. Based on the proposed idea, we introduce two decomposition schemes A and B, which decompose the filters by splitting the filter structure and using spatial-wise attention, respectively. The decomposed filters not only maintain the favorable properties of guided dynamic filters as being content-dependent and spatially-variant, but also reduce model parameters and hardware costs, as the learned adaptors are decoupled with the number of feature channels. Extensive experimental results demonstrate that the methods using our schemes outperform state-of-the-art methods on the KITTI dataset, and rank 1st and 2nd on the KITTI benchmark at the time of submission. Meanwhile, they also achieve comparable performance on the NYUv2 dataset. In addition, our proposed methods are general and could be employed as plug-and-play feature fusion blocks in other multi-modal fusion tasks such as RGB-D salient object detection.

KW - Depth completion

KW - feature fusion

KW - guided dynamic filter

KW - multi-modal

KW - range sensing

UR - http://www.scopus.com/inward/record.url?scp=85164440729&partnerID=8YFLogxK

U2 - 10.1109/TCSVT.2023.3292398

DO - 10.1109/TCSVT.2023.3292398

M3 - 文章

AN - SCOPUS:85164440729

SN - 1051-8215

VL - 34

SP - 1186

EP - 1198

JO - IEEE Transactions on Circuits and Systems for Video Technology

JF - IEEE Transactions on Circuits and Systems for Video Technology

IS - 2

ER -

Decomposed Guided Dynamic Filters for Efficient RGB-Guided Depth Completion

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this