Abstract
Multicamera 3D object detection has emerged as a research focus due to its cost-effectiveness. Recent methods perform well on clean datasets but fail in complex environments where adverse weather induces detection challenges (low foreground-background contrast, occlusion) that are mirrored in camouflage scenarios. Owing to training on large-scale datasets, the segment anything model (SAM) has strong generalizability and robustness but lacks the ability to capture spatial structure and depth information which are critical in 3D tasks. To address this issue, we propose SAMDistill, a distillation framework which uses a pretrained LiDAR detector as a teacher and three carefully designed distillation losses: 1) Spatial-temporal feature alignment ensures geometric consistency in static scenes while propagating cross-frame semantic context. 2) We extend relation distillation to multiscale layers. 3) The instance distillation loss which combines the regression distillation and classification distillation guides the model to focus on areas that are difficult to learn. Experiments show that our method achieves state-of-the-art performance on nuScenes and the noisy dataset nuScenes-C, and demonstrate the generalization across multiple teacher-student configurations.
| Original language | English |
|---|---|
| Journal | Machine Intelligence Research |
| DOIs | |
| State | Accepted/In press - 2026 |
Keywords
- 3D noise-resistant detection
- autonomous driving
- foundation model
- knowledge distillation
- multimodal learning
- spatial-temporal representation
Fingerprint
Dive into the research topics of 'SAMDistill: SAM-based Spatial-temporal Distillation for Robust 3D Object Detection'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver