Visual Consistency Enhancement for Multiview Stereo Reconstruction in Remote Sensing

Wei Zhang, Qiang Li, Yuan Yuan, Qi Wang

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

Learnable multiview stereo (MVS) aerial image depth estimation has obtained great success in 3-D digital urban reconstruction. Currently, most depth estimation methods in the large-scale sense heavily involve adapting the general MVS framework. However, these methods often overlook the cross-view interval and limited viewpoint inherent in aerial images data. In this article, we introduce an learning-based MVS method for aerial image depth estimation, which enhances visual consistency to address the insufficient accuracy caused by the characteristics of aerial image data, namely, AggrMVS. First, an optical flow-guided feature extraction module is introduced to map the dynamic relationship between reference and source images. It explicitly captures edge information of different depth components to guide the cost volume regularization. Second, a cross-view volume fusion module is proposed to enhance the interaction among reference volumes, further improving the aggregation ability of the source volume. Furthermore, AggrMVS achieves refined aerial image depth estimation results with a lightweight cascade architecture. Since low-altitude oblique aerial datasets currently lack, we reconstruct a multicategory synthetic aerial scene benchmark from general MVS datasets. The benchmark dataset is available at https://github.com/ToscW/BlendedUAV. Experiments on public and proposed datasets confirm that AggrMVS outperforms other MVS depth estimation methods in terms of qualitative and quantitative aspects.

Original languageEnglish
Article number5646011
JournalIEEE Transactions on Geoscience and Remote Sensing
Volume62
DOIs
StatePublished - 2024

Keywords

  • 3-D reconstruction
  • dense image matching
  • multiview stereo (MVS)
  • vision consistency

Fingerprint

Dive into the research topics of 'Visual Consistency Enhancement for Multiview Stereo Reconstruction in Remote Sensing'. Together they form a unique fingerprint.

Cite this