Learnable Cross-Correlation based Filter-and-Sum Networks for Multi-channel Speech Separation

Xianrui Wang; Shiqi Zhang; Bo He; Shoji Makino; Jingdong Chen

doi:10.1109/APSIPAASC63619.2025.10848617

Learnable Cross-Correlation based Filter-and-Sum Networks for Multi-channel Speech Separation

Xianrui Wang, Shiqi Zhang, Bo He, Shoji Makino, Jingdong Chen

航海学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

摘要

Multichannel source separation plays an important role in audio and speech signal processing. With recent advancements in deep neural networks (DNN), numerous DNN-based beamforming algorithms have been developed. To leverage spatial information, a time domain filter-and-sum network (FaSNet) was introduced, and the transform average concatenate (TAC) technique was subsequently adopted to further enhance separation performance. FaSNet captures spatial information by assessing cosine similarity between different channels; but this approach may have limited spatial resolution and could exhibit bias in noisy, reverberant environments, thereby potentially compromising performance. Motivated by the efficacy of the generalized cross-correlation (GCC) method in achieving reliable source localization in adverse environments, this paper introduces a learnable cross-correlation (LCC) module for FaSNet and FaSNet-TAC. By offering improved flexibility and robustness across diverse environments, LCC enhances source separation performance, which is validated by several simulations.

源语言	英语
主期刊名	APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024
出版商	Institute of Electrical and Electronics Engineers Inc.
ISBN（电子版）	9798350367331
DOI	https://doi.org/10.1109/APSIPAASC63619.2025.10848617
出版状态	已出版 - 2024
活动	2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2024 - Macau, 中国期限: 3 12月 2024 → 6 12月 2024

出版系列

姓名	APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024

会议

会议	2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2024
国家/地区	中国
市	Macau
时期	3/12/24 → 6/12/24

访问文件

10.1109/APSIPAASC63619.2025.10848617

其它文件与链接

链接到 Scopus 的出版物

引用此

Wang, X., Zhang, S., He, B., Makino, S., & Chen, J. (2024). Learnable Cross-Correlation based Filter-and-Sum Networks for Multi-channel Speech Separation. 在 APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024 (APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/APSIPAASC63619.2025.10848617

Wang, Xianrui ; Zhang, Shiqi ; He, Bo 等. / Learnable Cross-Correlation based Filter-and-Sum Networks for Multi-channel Speech Separation. APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024. Institute of Electrical and Electronics Engineers Inc., 2024. (APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024).

@inproceedings{9004612fb9b447e991a52fb890582929,

title = "Learnable Cross-Correlation based Filter-and-Sum Networks for Multi-channel Speech Separation",

abstract = "Multichannel source separation plays an important role in audio and speech signal processing. With recent advancements in deep neural networks (DNN), numerous DNN-based beamforming algorithms have been developed. To leverage spatial information, a time domain filter-and-sum network (FaSNet) was introduced, and the transform average concatenate (TAC) technique was subsequently adopted to further enhance separation performance. FaSNet captures spatial information by assessing cosine similarity between different channels; but this approach may have limited spatial resolution and could exhibit bias in noisy, reverberant environments, thereby potentially compromising performance. Motivated by the efficacy of the generalized cross-correlation (GCC) method in achieving reliable source localization in adverse environments, this paper introduces a learnable cross-correlation (LCC) module for FaSNet and FaSNet-TAC. By offering improved flexibility and robustness across diverse environments, LCC enhances source separation performance, which is validated by several simulations.",

keywords = "Multichannel source separation, learnable cross-correlation, neural network based beamfroming, spatial information",

author = "Xianrui Wang and Shiqi Zhang and Bo He and Shoji Makino and Jingdong Chen",

note = "Publisher Copyright: {\textcopyright} 2024 IEEE.; 2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2024 ; Conference date: 03-12-2024 Through 06-12-2024",

year = "2024",

doi = "10.1109/APSIPAASC63619.2025.10848617",

language = "英语",

series = "APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

booktitle = "APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024",

}

Wang, X, Zhang, S, He, B, Makino, S & Chen, J 2024, Learnable Cross-Correlation based Filter-and-Sum Networks for Multi-channel Speech Separation. 在 APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024. APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024, Institute of Electrical and Electronics Engineers Inc., 2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2024, Macau, 中国, 3/12/24. https://doi.org/10.1109/APSIPAASC63619.2025.10848617

Learnable Cross-Correlation based Filter-and-Sum Networks for Multi-channel Speech Separation. / Wang, Xianrui; Zhang, Shiqi; He, Bo 等.
APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024. Institute of Electrical and Electronics Engineers Inc., 2024. (APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Learnable Cross-Correlation based Filter-and-Sum Networks for Multi-channel Speech Separation

AU - Wang, Xianrui

AU - Zhang, Shiqi

AU - He, Bo

AU - Makino, Shoji

AU - Chen, Jingdong

PY - 2024

Y1 - 2024

N2 - Multichannel source separation plays an important role in audio and speech signal processing. With recent advancements in deep neural networks (DNN), numerous DNN-based beamforming algorithms have been developed. To leverage spatial information, a time domain filter-and-sum network (FaSNet) was introduced, and the transform average concatenate (TAC) technique was subsequently adopted to further enhance separation performance. FaSNet captures spatial information by assessing cosine similarity between different channels; but this approach may have limited spatial resolution and could exhibit bias in noisy, reverberant environments, thereby potentially compromising performance. Motivated by the efficacy of the generalized cross-correlation (GCC) method in achieving reliable source localization in adverse environments, this paper introduces a learnable cross-correlation (LCC) module for FaSNet and FaSNet-TAC. By offering improved flexibility and robustness across diverse environments, LCC enhances source separation performance, which is validated by several simulations.

AB - Multichannel source separation plays an important role in audio and speech signal processing. With recent advancements in deep neural networks (DNN), numerous DNN-based beamforming algorithms have been developed. To leverage spatial information, a time domain filter-and-sum network (FaSNet) was introduced, and the transform average concatenate (TAC) technique was subsequently adopted to further enhance separation performance. FaSNet captures spatial information by assessing cosine similarity between different channels; but this approach may have limited spatial resolution and could exhibit bias in noisy, reverberant environments, thereby potentially compromising performance. Motivated by the efficacy of the generalized cross-correlation (GCC) method in achieving reliable source localization in adverse environments, this paper introduces a learnable cross-correlation (LCC) module for FaSNet and FaSNet-TAC. By offering improved flexibility and robustness across diverse environments, LCC enhances source separation performance, which is validated by several simulations.

KW - Multichannel source separation

KW - learnable cross-correlation

KW - neural network based beamfroming

KW - spatial information

UR - http://www.scopus.com/inward/record.url?scp=85218195839&partnerID=8YFLogxK

U2 - 10.1109/APSIPAASC63619.2025.10848617

DO - 10.1109/APSIPAASC63619.2025.10848617

M3 - 会议稿件

AN - SCOPUS:85218195839

T3 - APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024

BT - APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2024

Y2 - 3 December 2024 through 6 December 2024

ER -

Wang X, Zhang S, He B, Makino S, Chen J. Learnable Cross-Correlation based Filter-and-Sum Networks for Multi-channel Speech Separation. 在 APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024. Institute of Electrical and Electronics Engineers Inc. 2024. (APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024). doi: 10.1109/APSIPAASC63619.2025.10848617

Learnable Cross-Correlation based Filter-and-Sum Networks for Multi-channel Speech Separation

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此