Spatial-DCCRN: DCCRN Equipped with Frame-Level Angle Feature and Hybrid Filtering for Multi-Channel Speech Enhancement

Shubo Lv; Yihui Fu; Yukai Jv; Lei Xie; Weixin Zhu; Wei Rao; Yannan Wang

doi:10.1109/SLT54892.2023.10022488

Spatial-DCCRN: DCCRN Equipped with Frame-Level Angle Feature and Hybrid Filtering for Multi-Channel Speech Enhancement

Shubo Lv, Yihui Fu, Yukai Jv, Lei Xie, Weixin Zhu, Wei Rao, Yannan Wang

计算机学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

11 引用（Scopus）

摘要

Recently, multi-channel speech enhancement has drawn much interest due to the use of spatial information to distinguish target speech from interfering signal. To make full use of spatial information and neural network based masking estimation, we propose a multi-channel denoising neural network - Spatial DCCRN. Firstly, we extend S-DCCRN to multi -channel scenario, aiming at performing cascaded sub-channel and full-channel processing strategy, which can model different channels separately. Moreover, instead of only adopting multi-channel spectrum or concatenating first-channel's magnitude and IPD as the model's inputs, we apply an angle feature extraction module (AFE) to extract frame-level angle feature embeddings, which can help the model to apparently perceive spatial information. Finally, since the phenomenon of residual noise will be more serious when the noise and speech exist in the same time frequency (TF) bin, we particularly design a masking and mapping filtering method to substitute the traditional filter-and-sum operation, with the purpose of cascading coarsely denoising, dereverberation and residual noise suppression. The proposed model, Spatial-DCCRN, has surpassed EaBNet, FasNet as well as several competitive models on the L3DAS22 Challenge dataset. Not only the 3D scenario, Spatial-DCCRN outperforms state-of-the-art (SOTA) model MIMO-UNet by a large margin in multiple evaluation metrics on the multi-channel ConferencingSpeech2021 Challenge dataset. Ablation studies also demonstrate the effectiveness of different contributions.

源语言	英语
主期刊名	2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings
出版商	Institute of Electrical and Electronics Engineers Inc.
页	436-443
页数	8
ISBN（电子版）	9798350396904
DOI	https://doi.org/10.1109/SLT54892.2023.10022488
出版状态	已出版 - 2023
活动	2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Doha, 卡塔尔期限: 9 1月 2023 → 12 1月 2023

出版系列

姓名	2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings

会议

会议	2022 IEEE Spoken Language Technology Workshop, SLT 2022
国家/地区	卡塔尔
市	Doha
时期	9/01/23 → 12/01/23

访问文件

10.1109/SLT54892.2023.10022488

其它文件与链接

链接到 Scopus 的出版物

引用此

Lv, S., Fu, Y., Jv, Y., Xie, L., Zhu, W., Rao, W., & Wang, Y. (2023). Spatial-DCCRN: DCCRN Equipped with Frame-Level Angle Feature and Hybrid Filtering for Multi-Channel Speech Enhancement. 在 2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings (页码 436-443). (2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/SLT54892.2023.10022488

Lv, Shubo ; Fu, Yihui ; Jv, Yukai 等. / Spatial-DCCRN : DCCRN Equipped with Frame-Level Angle Feature and Hybrid Filtering for Multi-Channel Speech Enhancement. 2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2023. 页码 436-443 (2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings).

@inproceedings{a0508ded44794be28c31b31d93baa2dc,

title = "Spatial-DCCRN: DCCRN Equipped with Frame-Level Angle Feature and Hybrid Filtering for Multi-Channel Speech Enhancement",

abstract = "Recently, multi-channel speech enhancement has drawn much interest due to the use of spatial information to distinguish target speech from interfering signal. To make full use of spatial information and neural network based masking estimation, we propose a multi-channel denoising neural network - Spatial DCCRN. Firstly, we extend S-DCCRN to multi -channel scenario, aiming at performing cascaded sub-channel and full-channel processing strategy, which can model different channels separately. Moreover, instead of only adopting multi-channel spectrum or concatenating first-channel's magnitude and IPD as the model's inputs, we apply an angle feature extraction module (AFE) to extract frame-level angle feature embeddings, which can help the model to apparently perceive spatial information. Finally, since the phenomenon of residual noise will be more serious when the noise and speech exist in the same time frequency (TF) bin, we particularly design a masking and mapping filtering method to substitute the traditional filter-and-sum operation, with the purpose of cascading coarsely denoising, dereverberation and residual noise suppression. The proposed model, Spatial-DCCRN, has surpassed EaBNet, FasNet as well as several competitive models on the L3DAS22 Challenge dataset. Not only the 3D scenario, Spatial-DCCRN outperforms state-of-the-art (SOTA) model MIMO-UNet by a large margin in multiple evaluation metrics on the multi-channel ConferencingSpeech2021 Challenge dataset. Ablation studies also demonstrate the effectiveness of different contributions.",

keywords = "multi-channel, Spatial-DCCRN, speech enhancement",

author = "Shubo Lv and Yihui Fu and Yukai Jv and Lei Xie and Weixin Zhu and Wei Rao and Yannan Wang",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 2022 IEEE Spoken Language Technology Workshop, SLT 2022 ; Conference date: 09-01-2023 Through 12-01-2023",

year = "2023",

doi = "10.1109/SLT54892.2023.10022488",

language = "英语",

series = "2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "436--443",

booktitle = "2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings",

}

Lv, S, Fu, Y, Jv, Y, Xie, L, Zhu, W, Rao, W & Wang, Y 2023, Spatial-DCCRN: DCCRN Equipped with Frame-Level Angle Feature and Hybrid Filtering for Multi-Channel Speech Enhancement. 在 2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings. 2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings, Institute of Electrical and Electronics Engineers Inc., 页码 436-443, 2022 IEEE Spoken Language Technology Workshop, SLT 2022, Doha, 卡塔尔, 9/01/23. https://doi.org/10.1109/SLT54892.2023.10022488

Spatial-DCCRN: DCCRN Equipped with Frame-Level Angle Feature and Hybrid Filtering for Multi-Channel Speech Enhancement. / Lv, Shubo; Fu, Yihui; Jv, Yukai 等.
2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2023. 页码 436-443 (2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Spatial-DCCRN

T2 - 2022 IEEE Spoken Language Technology Workshop, SLT 2022

AU - Lv, Shubo

AU - Fu, Yihui

AU - Jv, Yukai

AU - Xie, Lei

AU - Zhu, Weixin

AU - Rao, Wei

AU - Wang, Yannan

PY - 2023

Y1 - 2023

N2 - Recently, multi-channel speech enhancement has drawn much interest due to the use of spatial information to distinguish target speech from interfering signal. To make full use of spatial information and neural network based masking estimation, we propose a multi-channel denoising neural network - Spatial DCCRN. Firstly, we extend S-DCCRN to multi -channel scenario, aiming at performing cascaded sub-channel and full-channel processing strategy, which can model different channels separately. Moreover, instead of only adopting multi-channel spectrum or concatenating first-channel's magnitude and IPD as the model's inputs, we apply an angle feature extraction module (AFE) to extract frame-level angle feature embeddings, which can help the model to apparently perceive spatial information. Finally, since the phenomenon of residual noise will be more serious when the noise and speech exist in the same time frequency (TF) bin, we particularly design a masking and mapping filtering method to substitute the traditional filter-and-sum operation, with the purpose of cascading coarsely denoising, dereverberation and residual noise suppression. The proposed model, Spatial-DCCRN, has surpassed EaBNet, FasNet as well as several competitive models on the L3DAS22 Challenge dataset. Not only the 3D scenario, Spatial-DCCRN outperforms state-of-the-art (SOTA) model MIMO-UNet by a large margin in multiple evaluation metrics on the multi-channel ConferencingSpeech2021 Challenge dataset. Ablation studies also demonstrate the effectiveness of different contributions.

AB - Recently, multi-channel speech enhancement has drawn much interest due to the use of spatial information to distinguish target speech from interfering signal. To make full use of spatial information and neural network based masking estimation, we propose a multi-channel denoising neural network - Spatial DCCRN. Firstly, we extend S-DCCRN to multi -channel scenario, aiming at performing cascaded sub-channel and full-channel processing strategy, which can model different channels separately. Moreover, instead of only adopting multi-channel spectrum or concatenating first-channel's magnitude and IPD as the model's inputs, we apply an angle feature extraction module (AFE) to extract frame-level angle feature embeddings, which can help the model to apparently perceive spatial information. Finally, since the phenomenon of residual noise will be more serious when the noise and speech exist in the same time frequency (TF) bin, we particularly design a masking and mapping filtering method to substitute the traditional filter-and-sum operation, with the purpose of cascading coarsely denoising, dereverberation and residual noise suppression. The proposed model, Spatial-DCCRN, has surpassed EaBNet, FasNet as well as several competitive models on the L3DAS22 Challenge dataset. Not only the 3D scenario, Spatial-DCCRN outperforms state-of-the-art (SOTA) model MIMO-UNet by a large margin in multiple evaluation metrics on the multi-channel ConferencingSpeech2021 Challenge dataset. Ablation studies also demonstrate the effectiveness of different contributions.

KW - multi-channel

KW - Spatial-DCCRN

KW - speech enhancement

UR - http://www.scopus.com/inward/record.url?scp=85147793892&partnerID=8YFLogxK

U2 - 10.1109/SLT54892.2023.10022488

DO - 10.1109/SLT54892.2023.10022488

M3 - 会议稿件

AN - SCOPUS:85147793892

T3 - 2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings

SP - 436

EP - 443

BT - 2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 9 January 2023 through 12 January 2023

ER -

Lv S, Fu Y, Jv Y, Xie L, Zhu W, Rao W 等. Spatial-DCCRN: DCCRN Equipped with Frame-Level Angle Feature and Hybrid Filtering for Multi-Channel Speech Enhancement. 在 2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2023. 页码 436-443. (2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings). doi: 10.1109/SLT54892.2023.10022488

Spatial-DCCRN: DCCRN Equipped with Frame-Level Angle Feature and Hybrid Filtering for Multi-Channel Speech Enhancement

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此