Skip to main navigation Skip to search Skip to main content

Semantic-Guided Multiview Stereo Reconstruction for Aerial Image

  • Wei Zhang
  • , Zhigang Yang
  • , Qiang Li
  • , Qi Wang
  • Northwestern Polytechnical University Xian

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

The application of learning-based multiview stereo (MVS) depth estimation methods has achieved significant results in large-scale 3-D reconstruction benchmarks. However, adjacent terrains in the aerial image interfere with depth estimation along building edges during the matching process, leading to inaccurate results. To address these challenges, we propose a new end-to-end MVS network, named FuS-MVSNet, which fuses monocular depth probability as a semantic guidance into the multiview geometry-based MVS framework. By combining the strengths of geometric consistency and local semantics, the FuS-MVSNet achieves notable enhancements in both accuracy and robustness. Specifically, we first construct a monocular branch based on the pretrained Depth Anything model to perform monocular metric depth estimation. The nonshared parameters ensure that the depth estimation process is independent of the multiview branch, focusing exclusively on semantic depth inference. Subsequently, to incorporate monocular features into the multiview network, we introduce a volume adaptive fusion module, which adaptively integrates monocular feature volumes into the standard cost volume via an attention mechanism and guides the cost volume regularization. Finally, confidence-based dynamic selection between the two prediction branches ensures the selection of the more robust branch result under challenging conditions. Qualitative and quantitative results indicate that we achieve competitive performance on multiple benchmarks, including the WHU and LuoJia-MVS datasets.

Original languageEnglish
Article number5630611
JournalIEEE Transactions on Geoscience and Remote Sensing
Volume63
DOIs
StatePublished - 2025

Keywords

  • 3-D reconstruction
  • dense image matching
  • monocular depth estimation (MDE)
  • multiview stereo (MVS)

Fingerprint

Dive into the research topics of 'Semantic-Guided Multiview Stereo Reconstruction for Aerial Image'. Together they form a unique fingerprint.

Cite this