跳到主要导航 跳到搜索 跳到主要内容

The Hearing Impairment Phenomenon in Audio-Visual Sound Source Localization

  • Northwestern Polytechnical University Xian

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Audio-visual sound source localization (AV-SSL) leverages audio to identify the sounding object within the visual space. By mapping visual and audio modality representations into a shared space, cosine similarity-based methods have demonstrated strong localization performance. In this work, we discover a phenomenon in existing methods, termed Hearing Impairment (HI), which refers to the scenario where the network localizes a specific object in the image regardless of the input audio. To measure the extent of HI, three additional audio-visual mismatched datasets (Un-VGGSS, Un-S4 and Un-AVSS) are constructed and a novel metric is introduced, which is combined with mIoU to evaluate sound source localization performance comprehensively. We trained using the latest six AV-SSL methods on VGGSound, S4, and AVSS datasets, then evaluated them on VGGSS, S4 Test, and AVSS Test. Results indicate that some specific methods perform well in localization, but fail to distinguish whether the visual object is producing the sound. Future work should incorporate the evaluation of the model’s ability to differentiate sounding objects, rather than only focus on localization accuracy.

源语言英语
主期刊名2025 13th International Conference on Information and Communication Networks, ICICN 2025
出版商Institute of Electrical and Electronics Engineers Inc.
39-44
页数6
ISBN(电子版)9798331568344
DOI
出版状态已出版 - 2025
活动13th International Conference on Information and Communication Networks, ICICN 2025 - Beijing, 中国
期限: 8 8月 202511 8月 2025

出版系列

姓名2025 13th International Conference on Information and Communication Networks, ICICN 2025

会议

会议13th International Conference on Information and Communication Networks, ICICN 2025
国家/地区中国
Beijing
时期8/08/2511/08/25

指纹

探究 'The Hearing Impairment Phenomenon in Audio-Visual Sound Source Localization' 的科研主题。它们共同构成独一无二的指纹。

引用此