How does Layer Normalization improve Batch Normalization in self-supervised sound source localization?

Research output: Contribution to journalArticlepeer-review

11 Scopus citations

Abstract

Self-supervised sound source localization is usually challenged by the unexpected large input and incorrect direction of normalization in current solutions. A promising way for this challenge is to avoid feature deformation by incorporating more effective normalization, which is the motivation of this study. Based on the mathematical derivation of Layer Normalization (LN) in scale independence, in this work, a correspondence consolidation method is proposed to reinforce the audio–visual correspondence. By ensembling input feature normalization and LN-based simsiam Predictor, a joint gradient stabilization can be further achieved for more accurate sound source localization. Substantial experiments conducted on SoundNet-Flickr and VGG-Sound Source datasets have verified a superior performance in comparison to the other state-of-the-art works.

Original languageEnglish
Article number127040
JournalNeurocomputing
Volume567
DOIs
StatePublished - 28 Jan 2024

Keywords

  • Audio-visual
  • Batch Normalization
  • Layer Normalization
  • Sound source localization

Fingerprint

Dive into the research topics of 'How does Layer Normalization improve Batch Normalization in self-supervised sound source localization?'. Together they form a unique fingerprint.

Cite this