Visual Consistency Enhancement for Multiview Stereo Reconstruction in Remote Sensing

Wei Zhang; Qiang Li; Yuan Yuan; Qi Wang

doi:10.1109/TGRS.2024.3482697

Visual Consistency Enhancement for Multiview Stereo Reconstruction in Remote Sensing

Wei Zhang, Qiang Li, Yuan Yuan, Qi Wang

School of Artificial Intelligence, OPtics and Electronics

Northwestern Polytechnical University Xian

Research output: Contribution to journal › Article › peer-review

3 Scopus citations

Abstract

Learnable multiview stereo (MVS) aerial image depth estimation has obtained great success in 3-D digital urban reconstruction. Currently, most depth estimation methods in the large-scale sense heavily involve adapting the general MVS framework. However, these methods often overlook the cross-view interval and limited viewpoint inherent in aerial images data. In this article, we introduce an learning-based MVS method for aerial image depth estimation, which enhances visual consistency to address the insufficient accuracy caused by the characteristics of aerial image data, namely, AggrMVS. First, an optical flow-guided feature extraction module is introduced to map the dynamic relationship between reference and source images. It explicitly captures edge information of different depth components to guide the cost volume regularization. Second, a cross-view volume fusion module is proposed to enhance the interaction among reference volumes, further improving the aggregation ability of the source volume. Furthermore, AggrMVS achieves refined aerial image depth estimation results with a lightweight cascade architecture. Since low-altitude oblique aerial datasets currently lack, we reconstruct a multicategory synthetic aerial scene benchmark from general MVS datasets. The benchmark dataset is available at https://github.com/ToscW/BlendedUAV. Experiments on public and proposed datasets confirm that AggrMVS outperforms other MVS depth estimation methods in terms of qualitative and quantitative aspects.

Original language	English
Article number	5646011
Journal	IEEE Transactions on Geoscience and Remote Sensing
Volume	62
DOIs	https://doi.org/10.1109/TGRS.2024.3482697
State	Published - 2024

Keywords

3-D reconstruction
dense image matching
multiview stereo (MVS)
vision consistency

Access to Document

10.1109/TGRS.2024.3482697

Cite this

@article{7ffc93a4440640dd8c1124b8c6e84ee5,

title = "Visual Consistency Enhancement for Multiview Stereo Reconstruction in Remote Sensing",

abstract = "Learnable multiview stereo (MVS) aerial image depth estimation has obtained great success in 3-D digital urban reconstruction. Currently, most depth estimation methods in the large-scale sense heavily involve adapting the general MVS framework. However, these methods often overlook the cross-view interval and limited viewpoint inherent in aerial images data. In this article, we introduce an learning-based MVS method for aerial image depth estimation, which enhances visual consistency to address the insufficient accuracy caused by the characteristics of aerial image data, namely, AggrMVS. First, an optical flow-guided feature extraction module is introduced to map the dynamic relationship between reference and source images. It explicitly captures edge information of different depth components to guide the cost volume regularization. Second, a cross-view volume fusion module is proposed to enhance the interaction among reference volumes, further improving the aggregation ability of the source volume. Furthermore, AggrMVS achieves refined aerial image depth estimation results with a lightweight cascade architecture. Since low-altitude oblique aerial datasets currently lack, we reconstruct a multicategory synthetic aerial scene benchmark from general MVS datasets. The benchmark dataset is available at https://github.com/ToscW/BlendedUAV. Experiments on public and proposed datasets confirm that AggrMVS outperforms other MVS depth estimation methods in terms of qualitative and quantitative aspects.",

keywords = "3-D reconstruction, dense image matching, multiview stereo (MVS), vision consistency",

author = "Wei Zhang and Qiang Li and Yuan Yuan and Qi Wang",

note = "Publisher Copyright: {\textcopyright} 2024 IEEE.",

year = "2024",

doi = "10.1109/TGRS.2024.3482697",

language = "英语",

volume = "62",

journal = "IEEE Transactions on Geoscience and Remote Sensing",

issn = "0196-2892",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Visual Consistency Enhancement for Multiview Stereo Reconstruction in Remote Sensing

AU - Zhang, Wei

AU - Li, Qiang

AU - Yuan, Yuan

AU - Wang, Qi

PY - 2024

Y1 - 2024

N2 - Learnable multiview stereo (MVS) aerial image depth estimation has obtained great success in 3-D digital urban reconstruction. Currently, most depth estimation methods in the large-scale sense heavily involve adapting the general MVS framework. However, these methods often overlook the cross-view interval and limited viewpoint inherent in aerial images data. In this article, we introduce an learning-based MVS method for aerial image depth estimation, which enhances visual consistency to address the insufficient accuracy caused by the characteristics of aerial image data, namely, AggrMVS. First, an optical flow-guided feature extraction module is introduced to map the dynamic relationship between reference and source images. It explicitly captures edge information of different depth components to guide the cost volume regularization. Second, a cross-view volume fusion module is proposed to enhance the interaction among reference volumes, further improving the aggregation ability of the source volume. Furthermore, AggrMVS achieves refined aerial image depth estimation results with a lightweight cascade architecture. Since low-altitude oblique aerial datasets currently lack, we reconstruct a multicategory synthetic aerial scene benchmark from general MVS datasets. The benchmark dataset is available at https://github.com/ToscW/BlendedUAV. Experiments on public and proposed datasets confirm that AggrMVS outperforms other MVS depth estimation methods in terms of qualitative and quantitative aspects.

AB - Learnable multiview stereo (MVS) aerial image depth estimation has obtained great success in 3-D digital urban reconstruction. Currently, most depth estimation methods in the large-scale sense heavily involve adapting the general MVS framework. However, these methods often overlook the cross-view interval and limited viewpoint inherent in aerial images data. In this article, we introduce an learning-based MVS method for aerial image depth estimation, which enhances visual consistency to address the insufficient accuracy caused by the characteristics of aerial image data, namely, AggrMVS. First, an optical flow-guided feature extraction module is introduced to map the dynamic relationship between reference and source images. It explicitly captures edge information of different depth components to guide the cost volume regularization. Second, a cross-view volume fusion module is proposed to enhance the interaction among reference volumes, further improving the aggregation ability of the source volume. Furthermore, AggrMVS achieves refined aerial image depth estimation results with a lightweight cascade architecture. Since low-altitude oblique aerial datasets currently lack, we reconstruct a multicategory synthetic aerial scene benchmark from general MVS datasets. The benchmark dataset is available at https://github.com/ToscW/BlendedUAV. Experiments on public and proposed datasets confirm that AggrMVS outperforms other MVS depth estimation methods in terms of qualitative and quantitative aspects.

KW - 3-D reconstruction

KW - dense image matching

KW - multiview stereo (MVS)

KW - vision consistency

UR - http://www.scopus.com/inward/record.url?scp=85207337589&partnerID=8YFLogxK

U2 - 10.1109/TGRS.2024.3482697

DO - 10.1109/TGRS.2024.3482697

M3 - 文章

AN - SCOPUS:85207337589

SN - 0196-2892

VL - 62

JO - IEEE Transactions on Geoscience and Remote Sensing

JF - IEEE Transactions on Geoscience and Remote Sensing

M1 - 5646011

ER -

Visual Consistency Enhancement for Multiview Stereo Reconstruction in Remote Sensing

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this