Skip to main navigation Skip to search Skip to main content

HV-BEV: Decoupling Horizontal and Vertical Feature Sampling for Multi-View 3D Object Detection

  • Di Wu
  • , Feng Yang
  • , Benlian Xu
  • , Pan Liao
  • , Wenhui Zhao
  • , Dingwen Zhang
  • Northwestern Polytechnical University Xian
  • Suzhou University of Science and Technology

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

Vision-based multi-view perception systems, especially those using bird’s-eye view (BEV) representations, have become increasingly important in autonomous driving. Current state-of-the-art methods typically transform image features from multiple camera views into the BEV space via explicit or implicit depth estimation. However, they often rely on uniform or fixed-height sampling and lack height-aware priors, making them insensitive to the fact that different object categories occupy distinct local height ranges. Furthermore, most approaches treat BEV features as independent across grid locations, overlooking the structured correlations between different parts of an object in 3D space. These limitations hinder accurate spatial reasoning in both vertical and horizontal dimensions. To address these issues, we propose HV-BEV, a novel BEV perception framework that decouples the feature sampling process into Horizontal feature aggregation and Vertical adaptive height-aware reference point sampling. Specifically, for horizontal modeling, we dynamically construct a set of relevant neighboring points on the ground-aligned plane for each 3D reference point, facilitating structured cross-view feature aggregation and promoting consistent representation of large or partially visible objects. For vertical modeling, we introduce an adaptive height-aware module that leverages historical information to guide 3D reference points to focus on the plausible height regions where objects of interest are likely to appear, replacing fixed uniform height sampling. Extensive experiments on the nuScenes dataset demonstrate the effectiveness of our method. Our HV-BEV framework consistently outperforms baselines, achieving 50.5% mAP and 59.8% NDS on the nuScenes test set.

Original languageEnglish
Pages (from-to)18734-18746
Number of pages13
JournalIEEE Transactions on Intelligent Transportation Systems
Volume26
Issue number11
DOIs
StatePublished - 2025

Keywords

  • 3D object detection
  • autonomous driving
  • bird’s-eye view (BEV) representation
  • multi-camera
  • multi-view

Fingerprint

Dive into the research topics of 'HV-BEV: Decoupling Horizontal and Vertical Feature Sampling for Multi-View 3D Object Detection'. Together they form a unique fingerprint.

Cite this