Skip to main navigation Skip to search Skip to main content

Deep Learning-Based Sound Source Localization: A Review

  • Kunbo Xu
  • , Zekai Zong
  • , Dongjun Liu
  • , Ran Wang
  • , Liang Yu
  • Shanghai Maritime University
  • China Aerodynamics Research and Development Center
  • State Key Lahoratory of Airliner Integration Technology and Flight Simulation

Research output: Contribution to journalReview articlepeer-review

5 Scopus citations

Abstract

As a fundamental technology in environmental perception, sound source localization (SSL) plays a critical role in public safety, marine exploration, and smart home systems. However, traditional methods such as beamforming and time-delay estimation rely on manually designed physical models and idealized assumptions, which struggle to meet practical demands in dynamic and complex scenarios. Recent advancements in deep learning have revolutionized SSL by leveraging its end-to-end feature adaptability, cross-scenario generalization capabilities, and data-driven modeling, significantly enhancing localization robustness and accuracy in challenging environments. This review systematically examines the progress of deep learning-based SSL across three critical domains: marine environments, indoor reverberant spaces, and unmanned aerial vehicle (UAV) monitoring. In marine scenarios, complex-valued convolutional networks combined with adversarial transfer learning mitigate environmental mismatch and multipath interference through phase information fusion and domain adaptation strategies. For indoor high-reverberation conditions, attention mechanisms and multimodal fusion architectures achieve precise localization under low signal-to-noise ratios by adaptively weighting critical acoustic features. In UAV surveillance, lightweight models integrated with spatiotemporal Transformers address dynamic modeling of non-stationary noise spectra and edge computing efficiency constraints. Despite these advancements, current approaches face three core challenges: the insufficient integration of physical principles, prohibitive data annotation costs, and the trade-off between real-time performance and accuracy. Future research should prioritize physics-informed modeling to embed acoustic propagation mechanisms, unsupervised domain adaptation to reduce reliance on labeled data, and sensor-algorithm co-design to optimize hardware-software synergy. These directions aim to propel SSL toward intelligent systems characterized by high precision, strong robustness, and low power consumption. This work provides both theoretical foundations and technical references for algorithm selection and practical implementation in complex real-world scenarios.

Original languageEnglish
Article number7419
JournalApplied Sciences (Switzerland)
Volume15
Issue number13
DOIs
StatePublished - Jul 2025

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 14 - Life Below Water
    SDG 14 Life Below Water

Keywords

  • complex environment
  • deep learning
  • model architectures
  • robustness
  • sound source localization

Fingerprint

Dive into the research topics of 'Deep Learning-Based Sound Source Localization: A Review'. Together they form a unique fingerprint.

Cite this