TY - GEN
T1 - Spatial-DCCRN
T2 - 2022 IEEE Spoken Language Technology Workshop, SLT 2022
AU - Lv, Shubo
AU - Fu, Yihui
AU - Jv, Yukai
AU - Xie, Lei
AU - Zhu, Weixin
AU - Rao, Wei
AU - Wang, Yannan
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Recently, multi-channel speech enhancement has drawn much interest due to the use of spatial information to distinguish target speech from interfering signal. To make full use of spatial information and neural network based masking estimation, we propose a multi-channel denoising neural network - Spatial DCCRN. Firstly, we extend S-DCCRN to multi -channel scenario, aiming at performing cascaded sub-channel and full-channel processing strategy, which can model different channels separately. Moreover, instead of only adopting multi-channel spectrum or concatenating first-channel's magnitude and IPD as the model's inputs, we apply an angle feature extraction module (AFE) to extract frame-level angle feature embeddings, which can help the model to apparently perceive spatial information. Finally, since the phenomenon of residual noise will be more serious when the noise and speech exist in the same time frequency (TF) bin, we particularly design a masking and mapping filtering method to substitute the traditional filter-and-sum operation, with the purpose of cascading coarsely denoising, dereverberation and residual noise suppression. The proposed model, Spatial-DCCRN, has surpassed EaBNet, FasNet as well as several competitive models on the L3DAS22 Challenge dataset. Not only the 3D scenario, Spatial-DCCRN outperforms state-of-the-art (SOTA) model MIMO-UNet by a large margin in multiple evaluation metrics on the multi-channel ConferencingSpeech2021 Challenge dataset. Ablation studies also demonstrate the effectiveness of different contributions.
AB - Recently, multi-channel speech enhancement has drawn much interest due to the use of spatial information to distinguish target speech from interfering signal. To make full use of spatial information and neural network based masking estimation, we propose a multi-channel denoising neural network - Spatial DCCRN. Firstly, we extend S-DCCRN to multi -channel scenario, aiming at performing cascaded sub-channel and full-channel processing strategy, which can model different channels separately. Moreover, instead of only adopting multi-channel spectrum or concatenating first-channel's magnitude and IPD as the model's inputs, we apply an angle feature extraction module (AFE) to extract frame-level angle feature embeddings, which can help the model to apparently perceive spatial information. Finally, since the phenomenon of residual noise will be more serious when the noise and speech exist in the same time frequency (TF) bin, we particularly design a masking and mapping filtering method to substitute the traditional filter-and-sum operation, with the purpose of cascading coarsely denoising, dereverberation and residual noise suppression. The proposed model, Spatial-DCCRN, has surpassed EaBNet, FasNet as well as several competitive models on the L3DAS22 Challenge dataset. Not only the 3D scenario, Spatial-DCCRN outperforms state-of-the-art (SOTA) model MIMO-UNet by a large margin in multiple evaluation metrics on the multi-channel ConferencingSpeech2021 Challenge dataset. Ablation studies also demonstrate the effectiveness of different contributions.
KW - multi-channel
KW - Spatial-DCCRN
KW - speech enhancement
UR - http://www.scopus.com/inward/record.url?scp=85147793892&partnerID=8YFLogxK
U2 - 10.1109/SLT54892.2023.10022488
DO - 10.1109/SLT54892.2023.10022488
M3 - 会议稿件
AN - SCOPUS:85147793892
T3 - 2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings
SP - 436
EP - 443
BT - 2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 9 January 2023 through 12 January 2023
ER -