DCCRN+: Channel-wise subband DCCRN with SNR estimation for speech enhancement

Shubo Lv; Yanxin Hu; Shimin Zhang; Lei Xie

doi:10.21437/Interspeech.2021-1482

DCCRN+: Channel-wise subband DCCRN with SNR estimation for speech enhancement

Shubo Lv, Yanxin Hu, Shimin Zhang, Lei Xie

School of Computer Science

Northwestern Polytechnical University Xian

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

75 Scopus citations

Abstract

Deep complex convolution recurrent network (DCCRN), which extends CRN with complex structure, has achieved superior performance in MOS evaluation in Interspeech 2020 deep noise suppression challenge (DNS2020). This paper further extends DCCRN with the following significant revisions. We first extend the model to sub-band processing where the bands are split and merged by learnable neural network filters instead of engineered FIR filters, leading to a faster noise suppressor trained in an end-to-end manner. Then the LSTM is further substituted with a complex TF-LSTM to better model temporal dependencies along both time and frequency axes. Moreover, instead of simply concatenating the output of each encoder layer to the input of the corresponding decoder layer, we use convolution blocks to first aggregate essential information from the encoder output before feeding it to the decoder layers. We specifically formulate the decoder with an extra a priori SNR estimation module to maintain good speech quality while removing noise. Finally a post-processing module is adopted to further suppress the unnatural residual noise. The new model, named DCCRN+, has surpassed the original DCCRN as well as several competitive models in terms of PESQ and DNSMOS, and has achieved superior performance in the new Interspeech 2021 DNS challenge.

Original language	English
Title of host publication	22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
Publisher	International Speech Communication Association
Pages	816-820
Number of pages	5
ISBN (Electronic)	9781713836902
DOIs	https://doi.org/10.21437/Interspeech.2021-1482
State	Published - 2021
Event	22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 - Brno, Czech Republic Duration: 30 Aug 2021 → 3 Sep 2021

Publication series

Name	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume	2
ISSN (Print)	2308-457X
ISSN (Electronic)	1990-9772

Conference

Conference	22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
Country/Territory	Czech Republic
City	Brno
Period	30/08/21 → 3/09/21

Keywords

Deep complex convolution recurrent network
Speech enhancement
Sub-band processing

Access to Document

10.21437/Interspeech.2021-1482

Cite this

Lv, S., Hu, Y., Zhang, S., & Xie, L. (2021). DCCRN+: Channel-wise subband DCCRN with SNR estimation for speech enhancement. In 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 (pp. 816-820). (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH; Vol. 2). International Speech Communication Association. https://doi.org/10.21437/Interspeech.2021-1482

Lv, Shubo ; Hu, Yanxin ; Zhang, Shimin et al. / DCCRN+ : Channel-wise subband DCCRN with SNR estimation for speech enhancement. 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021. International Speech Communication Association, 2021. pp. 816-820 (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH).

@inproceedings{5d29e43df22d4d45b7cc97a4d8c855fa,

title = "DCCRN+: Channel-wise subband DCCRN with SNR estimation for speech enhancement",

abstract = "Deep complex convolution recurrent network (DCCRN), which extends CRN with complex structure, has achieved superior performance in MOS evaluation in Interspeech 2020 deep noise suppression challenge (DNS2020). This paper further extends DCCRN with the following significant revisions. We first extend the model to sub-band processing where the bands are split and merged by learnable neural network filters instead of engineered FIR filters, leading to a faster noise suppressor trained in an end-to-end manner. Then the LSTM is further substituted with a complex TF-LSTM to better model temporal dependencies along both time and frequency axes. Moreover, instead of simply concatenating the output of each encoder layer to the input of the corresponding decoder layer, we use convolution blocks to first aggregate essential information from the encoder output before feeding it to the decoder layers. We specifically formulate the decoder with an extra a priori SNR estimation module to maintain good speech quality while removing noise. Finally a post-processing module is adopted to further suppress the unnatural residual noise. The new model, named DCCRN+, has surpassed the original DCCRN as well as several competitive models in terms of PESQ and DNSMOS, and has achieved superior performance in the new Interspeech 2021 DNS challenge.",

keywords = "Deep complex convolution recurrent network, Speech enhancement, Sub-band processing",

author = "Shubo Lv and Yanxin Hu and Shimin Zhang and Lei Xie",

note = "Publisher Copyright: Copyright {\textcopyright} 2021 ISCA.; 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 ; Conference date: 30-08-2021 Through 03-09-2021",

year = "2021",

doi = "10.21437/Interspeech.2021-1482",

language = "英语",

series = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

publisher = "International Speech Communication Association",

pages = "816--820",

booktitle = "22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021",

}

Lv, S, Hu, Y, Zhang, S & Xie, L 2021, DCCRN+: Channel-wise subband DCCRN with SNR estimation for speech enhancement. in 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, vol. 2, International Speech Communication Association, pp. 816-820, 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021, Brno, Czech Republic, 30/08/21. https://doi.org/10.21437/Interspeech.2021-1482

DCCRN+: Channel-wise subband DCCRN with SNR estimation for speech enhancement. / Lv, Shubo; Hu, Yanxin; Zhang, Shimin et al.
22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021. International Speech Communication Association, 2021. p. 816-820 (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH; Vol. 2).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - DCCRN+

T2 - 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021

AU - Lv, Shubo

AU - Hu, Yanxin

AU - Zhang, Shimin

AU - Xie, Lei

PY - 2021

Y1 - 2021

N2 - Deep complex convolution recurrent network (DCCRN), which extends CRN with complex structure, has achieved superior performance in MOS evaluation in Interspeech 2020 deep noise suppression challenge (DNS2020). This paper further extends DCCRN with the following significant revisions. We first extend the model to sub-band processing where the bands are split and merged by learnable neural network filters instead of engineered FIR filters, leading to a faster noise suppressor trained in an end-to-end manner. Then the LSTM is further substituted with a complex TF-LSTM to better model temporal dependencies along both time and frequency axes. Moreover, instead of simply concatenating the output of each encoder layer to the input of the corresponding decoder layer, we use convolution blocks to first aggregate essential information from the encoder output before feeding it to the decoder layers. We specifically formulate the decoder with an extra a priori SNR estimation module to maintain good speech quality while removing noise. Finally a post-processing module is adopted to further suppress the unnatural residual noise. The new model, named DCCRN+, has surpassed the original DCCRN as well as several competitive models in terms of PESQ and DNSMOS, and has achieved superior performance in the new Interspeech 2021 DNS challenge.

AB - Deep complex convolution recurrent network (DCCRN), which extends CRN with complex structure, has achieved superior performance in MOS evaluation in Interspeech 2020 deep noise suppression challenge (DNS2020). This paper further extends DCCRN with the following significant revisions. We first extend the model to sub-band processing where the bands are split and merged by learnable neural network filters instead of engineered FIR filters, leading to a faster noise suppressor trained in an end-to-end manner. Then the LSTM is further substituted with a complex TF-LSTM to better model temporal dependencies along both time and frequency axes. Moreover, instead of simply concatenating the output of each encoder layer to the input of the corresponding decoder layer, we use convolution blocks to first aggregate essential information from the encoder output before feeding it to the decoder layers. We specifically formulate the decoder with an extra a priori SNR estimation module to maintain good speech quality while removing noise. Finally a post-processing module is adopted to further suppress the unnatural residual noise. The new model, named DCCRN+, has surpassed the original DCCRN as well as several competitive models in terms of PESQ and DNSMOS, and has achieved superior performance in the new Interspeech 2021 DNS challenge.

KW - Deep complex convolution recurrent network

KW - Speech enhancement

KW - Sub-band processing

UR - http://www.scopus.com/inward/record.url?scp=85117798696&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2021-1482

DO - 10.21437/Interspeech.2021-1482

M3 - 会议稿件

AN - SCOPUS:85117798696

T3 - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SP - 816

EP - 820

BT - 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021

PB - International Speech Communication Association

Y2 - 30 August 2021 through 3 September 2021

ER -

Lv S, Hu Y, Zhang S, Xie L. DCCRN+: Channel-wise subband DCCRN with SNR estimation for speech enhancement. In 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021. International Speech Communication Association. 2021. p. 816-820. (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH). doi: 10.21437/Interspeech.2021-1482

DCCRN+: Channel-wise subband DCCRN with SNR estimation for speech enhancement

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this