Efficient Multi-View Stereo by Dynamic Cost Volume and Cross-Scale Propagation

Shaoqian Wang; Bo Li; Yuchao Dai

doi:10.1109/TCSVT.2024.3398060

Efficient Multi-View Stereo by Dynamic Cost Volume and Cross-Scale Propagation

Shaoqian Wang, Bo Li, Yuchao Dai

电子信息学院

科研成果: 期刊稿件 › 文章 › 同行评审

5 引用（Scopus）

摘要

Currently, learning-based multi-view stereo (MVS) has been dominated by the pipeline of 3D cost volume and regularization network over the static cost volume for depth regression. However, this methodology is plagued by heavy time and memory consumption, which greatly hinders the applications of these methods for real-world high-resolution images. To address these challenges, we present Effi-MVS+, an efficient multi-scale dynamic cost volume based MVS method. Firstly, instead of constructing a static cost volume and predicting a probability distribution map for depth regression, we update the depth map by iteratively predicting depth residuals. In each iteration, we construct a lightweight dynamic cost volume by encoding local matching and regularization information. The dynamic cost volume is subsequently processed using a 2D convolution-based GRU, which owns significant advantages in computational complexity and efficiency. Secondly, we propose a cross-scale propagation mechanism to enhance the multi-scale dynamic cost volume. This mechanism facilitates the progressive aggregation of multi-scale information, thereby providing enhanced matching and regularization information. Thirdly, to further improve the efficiency, we provide a reliable initial depth map to launch the framework and guarantee fast convergence. Extensive experiments on the DTU and Tanks & Temples benchmarks demonstrate the superiority of our method, which outperforms other state-of-the-art methods by a large margin in terms of reconstruction quality, speed, and memory usage. Code will be released at https://github.com/npucvr/Effi-MVS-plus.

源语言	英语
页（从-至）	9414-9427
页数	14
期刊	IEEE Transactions on Circuits and Systems for Video Technology
卷	34
期	10
DOI	https://doi.org/10.1109/TCSVT.2024.3398060
出版状态	已出版 - 2024

访问文件

10.1109/TCSVT.2024.3398060

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{31cc8840b39848459ea8c9e828d79085,

title = "Efficient Multi-View Stereo by Dynamic Cost Volume and Cross-Scale Propagation",

abstract = "Currently, learning-based multi-view stereo (MVS) has been dominated by the pipeline of 3D cost volume and regularization network over the static cost volume for depth regression. However, this methodology is plagued by heavy time and memory consumption, which greatly hinders the applications of these methods for real-world high-resolution images. To address these challenges, we present Effi-MVS+, an efficient multi-scale dynamic cost volume based MVS method. Firstly, instead of constructing a static cost volume and predicting a probability distribution map for depth regression, we update the depth map by iteratively predicting depth residuals. In each iteration, we construct a lightweight dynamic cost volume by encoding local matching and regularization information. The dynamic cost volume is subsequently processed using a 2D convolution-based GRU, which owns significant advantages in computational complexity and efficiency. Secondly, we propose a cross-scale propagation mechanism to enhance the multi-scale dynamic cost volume. This mechanism facilitates the progressive aggregation of multi-scale information, thereby providing enhanced matching and regularization information. Thirdly, to further improve the efficiency, we provide a reliable initial depth map to launch the framework and guarantee fast convergence. Extensive experiments on the DTU and Tanks & Temples benchmarks demonstrate the superiority of our method, which outperforms other state-of-the-art methods by a large margin in terms of reconstruction quality, speed, and memory usage. Code will be released at https://github.com/npucvr/Effi-MVS-plus.",

keywords = "Multi-view stereo, cross-scale propagation, deep neural networks, dynamic cost volume",

author = "Shaoqian Wang and Bo Li and Yuchao Dai",

note = "Publisher Copyright: {\textcopyright} 2014 IEEE.",

year = "2024",

doi = "10.1109/TCSVT.2024.3398060",

language = "英语",

volume = "34",

pages = "9414--9427",

journal = "IEEE Transactions on Circuits and Systems for Video Technology",

issn = "1051-8215",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "10",

}

TY - JOUR

T1 - Efficient Multi-View Stereo by Dynamic Cost Volume and Cross-Scale Propagation

AU - Wang, Shaoqian

AU - Li, Bo

AU - Dai, Yuchao

PY - 2024

Y1 - 2024

N2 - Currently, learning-based multi-view stereo (MVS) has been dominated by the pipeline of 3D cost volume and regularization network over the static cost volume for depth regression. However, this methodology is plagued by heavy time and memory consumption, which greatly hinders the applications of these methods for real-world high-resolution images. To address these challenges, we present Effi-MVS+, an efficient multi-scale dynamic cost volume based MVS method. Firstly, instead of constructing a static cost volume and predicting a probability distribution map for depth regression, we update the depth map by iteratively predicting depth residuals. In each iteration, we construct a lightweight dynamic cost volume by encoding local matching and regularization information. The dynamic cost volume is subsequently processed using a 2D convolution-based GRU, which owns significant advantages in computational complexity and efficiency. Secondly, we propose a cross-scale propagation mechanism to enhance the multi-scale dynamic cost volume. This mechanism facilitates the progressive aggregation of multi-scale information, thereby providing enhanced matching and regularization information. Thirdly, to further improve the efficiency, we provide a reliable initial depth map to launch the framework and guarantee fast convergence. Extensive experiments on the DTU and Tanks & Temples benchmarks demonstrate the superiority of our method, which outperforms other state-of-the-art methods by a large margin in terms of reconstruction quality, speed, and memory usage. Code will be released at https://github.com/npucvr/Effi-MVS-plus.

AB - Currently, learning-based multi-view stereo (MVS) has been dominated by the pipeline of 3D cost volume and regularization network over the static cost volume for depth regression. However, this methodology is plagued by heavy time and memory consumption, which greatly hinders the applications of these methods for real-world high-resolution images. To address these challenges, we present Effi-MVS+, an efficient multi-scale dynamic cost volume based MVS method. Firstly, instead of constructing a static cost volume and predicting a probability distribution map for depth regression, we update the depth map by iteratively predicting depth residuals. In each iteration, we construct a lightweight dynamic cost volume by encoding local matching and regularization information. The dynamic cost volume is subsequently processed using a 2D convolution-based GRU, which owns significant advantages in computational complexity and efficiency. Secondly, we propose a cross-scale propagation mechanism to enhance the multi-scale dynamic cost volume. This mechanism facilitates the progressive aggregation of multi-scale information, thereby providing enhanced matching and regularization information. Thirdly, to further improve the efficiency, we provide a reliable initial depth map to launch the framework and guarantee fast convergence. Extensive experiments on the DTU and Tanks & Temples benchmarks demonstrate the superiority of our method, which outperforms other state-of-the-art methods by a large margin in terms of reconstruction quality, speed, and memory usage. Code will be released at https://github.com/npucvr/Effi-MVS-plus.

KW - Multi-view stereo

KW - cross-scale propagation

KW - deep neural networks

KW - dynamic cost volume

UR - http://www.scopus.com/inward/record.url?scp=85193016214&partnerID=8YFLogxK

U2 - 10.1109/TCSVT.2024.3398060

DO - 10.1109/TCSVT.2024.3398060

M3 - 文章

AN - SCOPUS:85193016214

SN - 1051-8215

VL - 34

SP - 9414

EP - 9427

JO - IEEE Transactions on Circuits and Systems for Video Technology

JF - IEEE Transactions on Circuits and Systems for Video Technology

IS - 10

ER -

Efficient Multi-View Stereo by Dynamic Cost Volume and Cross-Scale Propagation

摘要

访问文件

其它文件与链接

指纹

引用此