Abstract
Vision-based multi-view perception systems, especially those using bird’s-eye view (BEV) representations, have become increasingly important in autonomous driving. Current state-of-the-art methods typically transform image features from multiple camera views into the BEV space via explicit or implicit depth estimation. However, they often rely on uniform or fixed-height sampling and lack height-aware priors, making them insensitive to the fact that different object categories occupy distinct local height ranges. Furthermore, most approaches treat BEV features as independent across grid locations, overlooking the structured correlations between different parts of an object in 3D space. These limitations hinder accurate spatial reasoning in both vertical and horizontal dimensions. To address these issues, we propose HV-BEV, a novel BEV perception framework that decouples the feature sampling process into Horizontal feature aggregation and Vertical adaptive height-aware reference point sampling. Specifically, for horizontal modeling, we dynamically construct a set of relevant neighboring points on the ground-aligned plane for each 3D reference point, facilitating structured cross-view feature aggregation and promoting consistent representation of large or partially visible objects. For vertical modeling, we introduce an adaptive height-aware module that leverages historical information to guide 3D reference points to focus on the plausible height regions where objects of interest are likely to appear, replacing fixed uniform height sampling. Extensive experiments on the nuScenes dataset demonstrate the effectiveness of our method. Our HV-BEV framework consistently outperforms baselines, achieving 50.5% mAP and 59.8% NDS on the nuScenes test set.
| Original language | English |
|---|---|
| Pages (from-to) | 18734-18746 |
| Number of pages | 13 |
| Journal | IEEE Transactions on Intelligent Transportation Systems |
| Volume | 26 |
| Issue number | 11 |
| DOIs | |
| State | Published - 2025 |
Keywords
- 3D object detection
- autonomous driving
- bird’s-eye view (BEV) representation
- multi-camera
- multi-view
Fingerprint
Dive into the research topics of 'HV-BEV: Decoupling Horizontal and Vertical Feature Sampling for Multi-View 3D Object Detection'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver