S-DCCRN: SUPER WIDE BAND DCCRN WITH LEARNABLE COMPLEX FEATURE FOR SPEECH ENHANCEMENT

Shubo Lv; Yihui Fu; Mengtao Xing; Jiayao Sun; Lei Xie; Jun Huang; Yannan Wang; Tao Yu

doi:10.1109/ICASSP43922.2022.9747029

S-DCCRN: SUPER WIDE BAND DCCRN WITH LEARNABLE COMPLEX FEATURE FOR SPEECH ENHANCEMENT

Shubo Lv, Yihui Fu, Mengtao Xing, Jiayao Sun, Lei Xie, Jun Huang, Yannan Wang, Tao Yu

School of Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

50 Scopus citations

Abstract

In speech enhancement, complex neural network has shown promising performance due to their effectiveness in processing complex-valued spectrum. Most of the recent speech enhancement approaches mainly focus on wide-band signal with a sampling rate of 16K Hz. However, research on super wide band (e.g., 32K Hz) or even full-band (48K) denoising using deep learning is still in its infancy due to the difficulty of modeling more frequency bands and particularly high frequency components. In this paper, we extend our previous deep complex convolution recurrent neural network (DCCRN) substantially to a super wide band version - S-DCCRN, to perform speech denoising on speech of 32K Hz sampling rate. We first employ a cascaded sub-band and full-band processing module, which consists of two small-footprint DCCRNs - one operates on sub-band signal and one operates on full-band signal, aiming at benefiting from both local and global frequency information. Moreover, instead of simply adopting the STFT feature as input, we use a complex feature encoder trained in an end-to-end manner to refine the information of different frequency bands. We also use a complex feature decoder to revert the feature to time-frequency domain. Finally, a learnable spectrum compression method is adopted to adjust the energy of different frequency bands, which is beneficial for neural network learning. The proposed model, S-DCCRN, has surpassed PercepNet as well as several competitive models and achieves state-of-the-art performance in terms of speech quality and intelligibility. Ablation studies also demonstrate the effectiveness of different contributions.

Original language	English
Title of host publication	2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	7767-7771
Number of pages	5
ISBN (Electronic)	9781665405409
DOIs	https://doi.org/10.1109/ICASSP43922.2022.9747029
State	Published - 2022
Event	2022 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022 - Hybrid, Singapore Duration: 22 May 2022 → 27 May 2022

Publication series

Name	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume	2022-May
ISSN (Print)	1520-6149

Conference

Conference	2022 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022
Country/Territory	Singapore
City	Hybrid
Period	22/05/22 → 27/05/22

Keywords

S-DCCRN
speech enhancement
super wide band

Access to Document

10.1109/ICASSP43922.2022.9747029

Cite this

Lv, S., Fu, Y., Xing, M., Sun, J., Xie, L., Huang, J., Wang, Y., & Yu, T. (2022). S-DCCRN: SUPER WIDE BAND DCCRN WITH LEARNABLE COMPLEX FEATURE FOR SPEECH ENHANCEMENT. In 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings (pp. 7767-7771). (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2022-May). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP43922.2022.9747029

Lv, Shubo ; Fu, Yihui ; Xing, Mengtao et al. / S-DCCRN : SUPER WIDE BAND DCCRN WITH LEARNABLE COMPLEX FEATURE FOR SPEECH ENHANCEMENT. 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2022. pp. 7767-7771 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

@inproceedings{cd1545e90e8e4728ae8dc6e0f1593a98,

title = "S-DCCRN: SUPER WIDE BAND DCCRN WITH LEARNABLE COMPLEX FEATURE FOR SPEECH ENHANCEMENT",

abstract = "In speech enhancement, complex neural network has shown promising performance due to their effectiveness in processing complex-valued spectrum. Most of the recent speech enhancement approaches mainly focus on wide-band signal with a sampling rate of 16K Hz. However, research on super wide band (e.g., 32K Hz) or even full-band (48K) denoising using deep learning is still in its infancy due to the difficulty of modeling more frequency bands and particularly high frequency components. In this paper, we extend our previous deep complex convolution recurrent neural network (DCCRN) substantially to a super wide band version - S-DCCRN, to perform speech denoising on speech of 32K Hz sampling rate. We first employ a cascaded sub-band and full-band processing module, which consists of two small-footprint DCCRNs - one operates on sub-band signal and one operates on full-band signal, aiming at benefiting from both local and global frequency information. Moreover, instead of simply adopting the STFT feature as input, we use a complex feature encoder trained in an end-to-end manner to refine the information of different frequency bands. We also use a complex feature decoder to revert the feature to time-frequency domain. Finally, a learnable spectrum compression method is adopted to adjust the energy of different frequency bands, which is beneficial for neural network learning. The proposed model, S-DCCRN, has surpassed PercepNet as well as several competitive models and achieves state-of-the-art performance in terms of speech quality and intelligibility. Ablation studies also demonstrate the effectiveness of different contributions.",

keywords = "S-DCCRN, speech enhancement, super wide band",

author = "Shubo Lv and Yihui Fu and Mengtao Xing and Jiayao Sun and Lei Xie and Jun Huang and Yannan Wang and Tao Yu",

note = "Publisher Copyright: {\textcopyright} 2022 IEEE; 2022 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022 ; Conference date: 22-05-2022 Through 27-05-2022",

year = "2022",

doi = "10.1109/ICASSP43922.2022.9747029",

language = "英语",

series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "7767--7771",

booktitle = "2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings",

}

Lv, S, Fu, Y, Xing, M, Sun, J, Xie, L, Huang, J, Wang, Y & Yu, T 2022, S-DCCRN: SUPER WIDE BAND DCCRN WITH LEARNABLE COMPLEX FEATURE FOR SPEECH ENHANCEMENT. in 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 2022-May, Institute of Electrical and Electronics Engineers Inc., pp. 7767-7771, 2022 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Hybrid, Singapore, 22/05/22. https://doi.org/10.1109/ICASSP43922.2022.9747029

S-DCCRN: SUPER WIDE BAND DCCRN WITH LEARNABLE COMPLEX FEATURE FOR SPEECH ENHANCEMENT. / Lv, Shubo; Fu, Yihui; Xing, Mengtao et al.
2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2022. p. 7767-7771 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2022-May).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - S-DCCRN

T2 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022

AU - Lv, Shubo

AU - Fu, Yihui

AU - Xing, Mengtao

AU - Sun, Jiayao

AU - Xie, Lei

AU - Huang, Jun

AU - Wang, Yannan

AU - Yu, Tao

PY - 2022

Y1 - 2022

N2 - In speech enhancement, complex neural network has shown promising performance due to their effectiveness in processing complex-valued spectrum. Most of the recent speech enhancement approaches mainly focus on wide-band signal with a sampling rate of 16K Hz. However, research on super wide band (e.g., 32K Hz) or even full-band (48K) denoising using deep learning is still in its infancy due to the difficulty of modeling more frequency bands and particularly high frequency components. In this paper, we extend our previous deep complex convolution recurrent neural network (DCCRN) substantially to a super wide band version - S-DCCRN, to perform speech denoising on speech of 32K Hz sampling rate. We first employ a cascaded sub-band and full-band processing module, which consists of two small-footprint DCCRNs - one operates on sub-band signal and one operates on full-band signal, aiming at benefiting from both local and global frequency information. Moreover, instead of simply adopting the STFT feature as input, we use a complex feature encoder trained in an end-to-end manner to refine the information of different frequency bands. We also use a complex feature decoder to revert the feature to time-frequency domain. Finally, a learnable spectrum compression method is adopted to adjust the energy of different frequency bands, which is beneficial for neural network learning. The proposed model, S-DCCRN, has surpassed PercepNet as well as several competitive models and achieves state-of-the-art performance in terms of speech quality and intelligibility. Ablation studies also demonstrate the effectiveness of different contributions.

AB - In speech enhancement, complex neural network has shown promising performance due to their effectiveness in processing complex-valued spectrum. Most of the recent speech enhancement approaches mainly focus on wide-band signal with a sampling rate of 16K Hz. However, research on super wide band (e.g., 32K Hz) or even full-band (48K) denoising using deep learning is still in its infancy due to the difficulty of modeling more frequency bands and particularly high frequency components. In this paper, we extend our previous deep complex convolution recurrent neural network (DCCRN) substantially to a super wide band version - S-DCCRN, to perform speech denoising on speech of 32K Hz sampling rate. We first employ a cascaded sub-band and full-band processing module, which consists of two small-footprint DCCRNs - one operates on sub-band signal and one operates on full-band signal, aiming at benefiting from both local and global frequency information. Moreover, instead of simply adopting the STFT feature as input, we use a complex feature encoder trained in an end-to-end manner to refine the information of different frequency bands. We also use a complex feature decoder to revert the feature to time-frequency domain. Finally, a learnable spectrum compression method is adopted to adjust the energy of different frequency bands, which is beneficial for neural network learning. The proposed model, S-DCCRN, has surpassed PercepNet as well as several competitive models and achieves state-of-the-art performance in terms of speech quality and intelligibility. Ablation studies also demonstrate the effectiveness of different contributions.

KW - S-DCCRN

KW - speech enhancement

KW - super wide band

UR - http://www.scopus.com/inward/record.url?scp=85131232890&partnerID=8YFLogxK

U2 - 10.1109/ICASSP43922.2022.9747029

DO - 10.1109/ICASSP43922.2022.9747029

M3 - 会议稿件

AN - SCOPUS:85131232890

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

SP - 7767

EP - 7771

BT - 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 22 May 2022 through 27 May 2022

ER -

Lv S, Fu Y, Xing M, Sun J, Xie L, Huang J et al. S-DCCRN: SUPER WIDE BAND DCCRN WITH LEARNABLE COMPLEX FEATURE FOR SPEECH ENHANCEMENT. In 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2022. p. 7767-7771. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). doi: 10.1109/ICASSP43922.2022.9747029

S-DCCRN: SUPER WIDE BAND DCCRN WITH LEARNABLE COMPLEX FEATURE FOR SPEECH ENHANCEMENT

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this