Learnable Cross-Correlation based Filter-and-Sum Networks for Multi-channel Speech Separation

Xianrui Wang, Shiqi Zhang, Bo He, Shoji Makino, Jingdong Chen

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Multichannel source separation plays an important role in audio and speech signal processing. With recent advancements in deep neural networks (DNN), numerous DNN-based beamforming algorithms have been developed. To leverage spatial information, a time domain filter-and-sum network (FaSNet) was introduced, and the transform average concatenate (TAC) technique was subsequently adopted to further enhance separation performance. FaSNet captures spatial information by assessing cosine similarity between different channels; but this approach may have limited spatial resolution and could exhibit bias in noisy, reverberant environments, thereby potentially compromising performance. Motivated by the efficacy of the generalized cross-correlation (GCC) method in achieving reliable source localization in adverse environments, this paper introduces a learnable cross-correlation (LCC) module for FaSNet and FaSNet-TAC. By offering improved flexibility and robustness across diverse environments, LCC enhances source separation performance, which is validated by several simulations.

源语言英语
主期刊名APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024
出版商Institute of Electrical and Electronics Engineers Inc.
ISBN(电子版)9798350367331
DOI
出版状态已出版 - 2024
活动2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2024 - Macau, 中国
期限: 3 12月 20246 12月 2024

出版系列

姓名APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024

会议

会议2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2024
国家/地区中国
Macau
时期3/12/246/12/24

指纹

探究 'Learnable Cross-Correlation based Filter-and-Sum Networks for Multi-channel Speech Separation' 的科研主题。它们共同构成独一无二的指纹。

引用此