Skip to main navigation Skip to search Skip to main content

The Hearing Impairment Phenomenon in Audio-Visual Sound Source Localization

  • Northwestern Polytechnical University Xian

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Audio-visual sound source localization (AV-SSL) leverages audio to identify the sounding object within the visual space. By mapping visual and audio modality representations into a shared space, cosine similarity-based methods have demonstrated strong localization performance. In this work, we discover a phenomenon in existing methods, termed Hearing Impairment (HI), which refers to the scenario where the network localizes a specific object in the image regardless of the input audio. To measure the extent of HI, three additional audio-visual mismatched datasets (Un-VGGSS, Un-S4 and Un-AVSS) are constructed and a novel metric is introduced, which is combined with mIoU to evaluate sound source localization performance comprehensively. We trained using the latest six AV-SSL methods on VGGSound, S4, and AVSS datasets, then evaluated them on VGGSS, S4 Test, and AVSS Test. Results indicate that some specific methods perform well in localization, but fail to distinguish whether the visual object is producing the sound. Future work should incorporate the evaluation of the model’s ability to differentiate sounding objects, rather than only focus on localization accuracy.

Original languageEnglish
Title of host publication2025 13th International Conference on Information and Communication Networks, ICICN 2025
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages39-44
Number of pages6
ISBN (Electronic)9798331568344
DOIs
StatePublished - 2025
Event13th International Conference on Information and Communication Networks, ICICN 2025 - Beijing, China
Duration: 8 Aug 202511 Aug 2025

Publication series

Name2025 13th International Conference on Information and Communication Networks, ICICN 2025

Conference

Conference13th International Conference on Information and Communication Networks, ICICN 2025
Country/TerritoryChina
CityBeijing
Period8/08/2511/08/25

Keywords

  • Audio-visual
  • Hearing Impairment
  • Sound Source Localization
  • Sounding Object

Fingerprint

Dive into the research topics of 'The Hearing Impairment Phenomenon in Audio-Visual Sound Source Localization'. Together they form a unique fingerprint.

Cite this