Skip to main navigation Skip to search Skip to main content

SAMDistill: SAM-based Spatial-temporal Distillation for Robust 3D Object Detection

  • Zhaozhong Wang
  • , Dian Shao
  • , Lei Zhang
  • , Zuowei Zhang
  • , Binglu Wang
  • Northwestern Polytechnical University Xian

Research output: Contribution to journalArticlepeer-review

Abstract

Multicamera 3D object detection has emerged as a research focus due to its cost-effectiveness. Recent methods perform well on clean datasets but fail in complex environments where adverse weather induces detection challenges (low foreground-background contrast, occlusion) that are mirrored in camouflage scenarios. Owing to training on large-scale datasets, the segment anything model (SAM) has strong generalizability and robustness but lacks the ability to capture spatial structure and depth information which are critical in 3D tasks. To address this issue, we propose SAMDistill, a distillation framework which uses a pretrained LiDAR detector as a teacher and three carefully designed distillation losses: 1) Spatial-temporal feature alignment ensures geometric consistency in static scenes while propagating cross-frame semantic context. 2) We extend relation distillation to multiscale layers. 3) The instance distillation loss which combines the regression distillation and classification distillation guides the model to focus on areas that are difficult to learn. Experiments show that our method achieves state-of-the-art performance on nuScenes and the noisy dataset nuScenes-C, and demonstrate the generalization across multiple teacher-student configurations.

Original languageEnglish
JournalMachine Intelligence Research
DOIs
StateAccepted/In press - 2026

Keywords

  • 3D noise-resistant detection
  • autonomous driving
  • foundation model
  • knowledge distillation
  • multimodal learning
  • spatial-temporal representation

Fingerprint

Dive into the research topics of 'SAMDistill: SAM-based Spatial-temporal Distillation for Robust 3D Object Detection'. Together they form a unique fingerprint.

Cite this