Cross-View Geo-Localization via 3D Gaussian Splatting-Based Novel View Synthesis

  • Xiaokun Ding
  • , Xuanyu Zhang
  • , Shangzhen Song
  • , Bo Li
  • , Le Hui
  • , Yuchao Dai

Research output: Contribution to journalArticlepeer-review

Abstract

Highlights: What are the main findings? We propose a pipeline designed to enhance cross-view geo-localization (CVGL) by integrating novel view synthesis. The core of our framework reduces the cross-view feature discrepancy through the generation of perspective-aware overhead images, leading to superior geo-localization accuracy. A novel camera pose generation method is specifically designed for autonomous driving scenarios to address the challenge of missing vertical view pose. What are the implications of the main findings? The proposed method establishes a continuous feature transition between street-level and satellite imagery, thereby enhancing the model’s capability in cross-view geo-localization tasks. By integrating 3D Gaussian Splatting (3DGS)-based novel view synthesis into deep learning frameworks for CVGL, our approach enables the autonomous generation of corresponding bird’s-eye-view images directly from street-view inputs. Cross-view geo-localization allows an agent to determine its own position by retrieving the same scene from images taken from dramatically different perspectives. However, image matching and retrieval face significant challenges due to substantial viewpoint differences, unknown orientations, and considerable geometric distribution disparities between cross-view images. To this end, we propose a cross-view geo-localization framework based on novel view synthesis that generates pseudo aerial-view images from given street-view scenes to reduce the view discrepancies, thereby improving the performance of cross-view geo-localization. Specifically, we first employ 3D Gaussian splatting to generate new aerial images from the street-view image sequence, where COLMAP is used to obtain initial camera poses and sparse point clouds. To identify optimal matching viewpoints from reconstructed 3D scenes, we design an effective camera pose estimation strategy. By increasing the tilt angle between the photographic axis and the horizontal plane, the geometric consistency between the newly generated aerial images and the real ones can be improved. After that, the DINOv2 is employed to design a simple yet efficient mixed feature enhancement module, followed by the InfoNCE loss for cross-view geo-localization. Experimental results on the KITTI dataset demonstrate that our approach can significantly improve cross-view matching accuracy under large viewpoint disparities and achieve state-of-the-art localization performance.

Original languageEnglish
Article number3673
JournalRemote Sensing
Volume17
Issue number22
DOIs
StatePublished - Nov 2025

Keywords

  • 3D gaussian splatting
  • contrastive learning
  • cross-view geo-localization
  • novel view synthesis

Fingerprint

Dive into the research topics of 'Cross-View Geo-Localization via 3D Gaussian Splatting-Based Novel View Synthesis'. Together they form a unique fingerprint.

Cite this