Induction Network: Audio-Visual Modality Gap-Bridging for Self-Supervised Sound Source Localization

Tianyu Liu, Peng Zhang, Wei Huang, Yufei Zha, Tao You, Yanning Zhang

科研成果: 书/报告/会议事项章节会议稿件同行评审

2 引用 (Scopus)

摘要

Self-supervised sound source localization is usually challenged by the modality inconsistency. In recent studies, contrastive learning based strategies have shown promising to establish such a consistent correspondence between audio and sound sources in visual scenarios. Unfortunately, the insufficient attention to the heterogeneity influence in the different modality features still limits this scheme to be further improved, which also becomes the motivation of our work. In this study, an Induction Network is proposed to bridge the modality gap more effectively. By decoupling the gradients of visual and audio modalities, the discriminative visual representations of sound sources can be learned with the designed Induction Vector in a bootstrap manner, which also enables the audio modality to be aligned with the visual modality consistently. In addition to a visual weighted contrastive loss, an adaptive threshold selection strategy is introduced to enhance the robustness of the Induction Network. Substantial experiments conducted on SoundNet-Flickr and VGG-Sound Source datasets have demonstrated a superior performance compared to other state-of-the-art works in different challenging scenarios. The code is available at https://github.com/Tahy1/AVIN.

源语言英语
主期刊名MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia
出版商Association for Computing Machinery, Inc
4042-4052
页数11
ISBN(电子版)9798400701085
DOI
出版状态已出版 - 26 10月 2023
活动31st ACM International Conference on Multimedia, MM 2023 - Ottawa, 加拿大
期限: 29 10月 20233 11月 2023

出版系列

姓名MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia

会议

会议31st ACM International Conference on Multimedia, MM 2023
国家/地区加拿大
Ottawa
时期29/10/233/11/23

指纹

探究 'Induction Network: Audio-Visual Modality Gap-Bridging for Self-Supervised Sound Source Localization' 的科研主题。它们共同构成独一无二的指纹。

引用此