Learnable Cross-Correlation based Filter-and-Sum Networks for Multi-channel Speech Separation

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Multichannel source separation plays an important role in audio and speech signal processing. With recent advancements in deep neural networks (DNN), numerous DNN-based beamforming algorithms have been developed. To leverage spatial information, a time domain filter-and-sum network (FaSNet) was introduced, and the transform average concatenate (TAC) technique was subsequently adopted to further enhance separation performance. FaSNet captures spatial information by assessing cosine similarity between different channels; but this approach may have limited spatial resolution and could exhibit bias in noisy, reverberant environments, thereby potentially compromising performance. Motivated by the efficacy of the generalized cross-correlation (GCC) method in achieving reliable source localization in adverse environments, this paper introduces a learnable cross-correlation (LCC) module for FaSNet and FaSNet-TAC. By offering improved flexibility and robustness across diverse environments, LCC enhances source separation performance, which is validated by several simulations.

Original languageEnglish
Title of host publicationAPSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350367331
DOIs
StatePublished - 2024
Event2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2024 - Macau, China
Duration: 3 Dec 20246 Dec 2024

Publication series

NameAPSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024

Conference

Conference2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2024
Country/TerritoryChina
CityMacau
Period3/12/246/12/24

Keywords

  • Multichannel source separation
  • learnable cross-correlation
  • neural network based beamfroming
  • spatial information

Fingerprint

Dive into the research topics of 'Learnable Cross-Correlation based Filter-and-Sum Networks for Multi-channel Speech Separation'. Together they form a unique fingerprint.

Cite this