Neural speech enhancement with unsupervised pre-training and mixture training

Xiang Hao; Chenglin Xu; Lei Xie

doi:10.1016/j.neunet.2022.11.013

Neural speech enhancement with unsupervised pre-training and mixture training

Xiang Hao, Chenglin Xu, Lei Xie

School of Computer Science

Research output: Contribution to journal › Article › peer-review

15 Scopus citations

Abstract

Supervised neural speech enhancement methods always require a large scale of paired noisy and clean speech data. Since collecting adequate paired data from real-world applications is infeasible, simulated data is always adopted in supervised learning methods. However, the mismatch between the simulated data and in-the-wild data always causes performance inconsistency when the system is deployed in real-world applications. Unsupervised speech enhancement methods are studied to address the mismatch problem by directly using the in-the-wild noisy data without access to the corresponding clean speech. Therefore, the simulated paired data is not necessary. However, the performance of the unsupervised speech enhancement method is not on par with the supervised learning method. To address the aforementioned problems, this work proposes an unsupervised pre-training and mixture training algorithm by leveraging the advantages of supervised and unsupervised learning methods. Specifically, the proposed speech enhancement approach employs large volumes of unpaired noisy and clean speech to conduct unsupervised pre-training. The noisy data and a small amount of simulated paired data are then used for mixture training to optimize the pre-trained model. Experimental results show that the proposed method achieves better performances than other state-of-the-art supervised and unsupervised learning methods.

Original language	English
Pages (from-to)	216-227
Number of pages	12
Journal	Neural Networks
Volume	158
DOIs	https://doi.org/10.1016/j.neunet.2022.11.013
State	Published - Jan 2023

Keywords

Mixture training
Neural network
Speech enhancement
Unsupervised pre-training

Access to Document

10.1016/j.neunet.2022.11.013

Cite this

@article{d326777683d7405e8dfdbdf872dda482,

title = "Neural speech enhancement with unsupervised pre-training and mixture training",

abstract = "Supervised neural speech enhancement methods always require a large scale of paired noisy and clean speech data. Since collecting adequate paired data from real-world applications is infeasible, simulated data is always adopted in supervised learning methods. However, the mismatch between the simulated data and in-the-wild data always causes performance inconsistency when the system is deployed in real-world applications. Unsupervised speech enhancement methods are studied to address the mismatch problem by directly using the in-the-wild noisy data without access to the corresponding clean speech. Therefore, the simulated paired data is not necessary. However, the performance of the unsupervised speech enhancement method is not on par with the supervised learning method. To address the aforementioned problems, this work proposes an unsupervised pre-training and mixture training algorithm by leveraging the advantages of supervised and unsupervised learning methods. Specifically, the proposed speech enhancement approach employs large volumes of unpaired noisy and clean speech to conduct unsupervised pre-training. The noisy data and a small amount of simulated paired data are then used for mixture training to optimize the pre-trained model. Experimental results show that the proposed method achieves better performances than other state-of-the-art supervised and unsupervised learning methods.",

keywords = "Mixture training, Neural network, Speech enhancement, Unsupervised pre-training",

author = "Xiang Hao and Chenglin Xu and Lei Xie",

note = "Publisher Copyright: {\textcopyright} 2022 Elsevier Ltd",

year = "2023",

month = jan,

doi = "10.1016/j.neunet.2022.11.013",

language = "英语",

volume = "158",

pages = "216--227",

journal = "Neural Networks",

issn = "0893-6080",

publisher = "Elsevier Ltd",

}

TY - JOUR

T1 - Neural speech enhancement with unsupervised pre-training and mixture training

AU - Hao, Xiang

AU - Xu, Chenglin

AU - Xie, Lei

PY - 2023/1

Y1 - 2023/1

N2 - Supervised neural speech enhancement methods always require a large scale of paired noisy and clean speech data. Since collecting adequate paired data from real-world applications is infeasible, simulated data is always adopted in supervised learning methods. However, the mismatch between the simulated data and in-the-wild data always causes performance inconsistency when the system is deployed in real-world applications. Unsupervised speech enhancement methods are studied to address the mismatch problem by directly using the in-the-wild noisy data without access to the corresponding clean speech. Therefore, the simulated paired data is not necessary. However, the performance of the unsupervised speech enhancement method is not on par with the supervised learning method. To address the aforementioned problems, this work proposes an unsupervised pre-training and mixture training algorithm by leveraging the advantages of supervised and unsupervised learning methods. Specifically, the proposed speech enhancement approach employs large volumes of unpaired noisy and clean speech to conduct unsupervised pre-training. The noisy data and a small amount of simulated paired data are then used for mixture training to optimize the pre-trained model. Experimental results show that the proposed method achieves better performances than other state-of-the-art supervised and unsupervised learning methods.

AB - Supervised neural speech enhancement methods always require a large scale of paired noisy and clean speech data. Since collecting adequate paired data from real-world applications is infeasible, simulated data is always adopted in supervised learning methods. However, the mismatch between the simulated data and in-the-wild data always causes performance inconsistency when the system is deployed in real-world applications. Unsupervised speech enhancement methods are studied to address the mismatch problem by directly using the in-the-wild noisy data without access to the corresponding clean speech. Therefore, the simulated paired data is not necessary. However, the performance of the unsupervised speech enhancement method is not on par with the supervised learning method. To address the aforementioned problems, this work proposes an unsupervised pre-training and mixture training algorithm by leveraging the advantages of supervised and unsupervised learning methods. Specifically, the proposed speech enhancement approach employs large volumes of unpaired noisy and clean speech to conduct unsupervised pre-training. The noisy data and a small amount of simulated paired data are then used for mixture training to optimize the pre-trained model. Experimental results show that the proposed method achieves better performances than other state-of-the-art supervised and unsupervised learning methods.

KW - Mixture training

KW - Neural network

KW - Speech enhancement

KW - Unsupervised pre-training

UR - http://www.scopus.com/inward/record.url?scp=85143538089&partnerID=8YFLogxK

U2 - 10.1016/j.neunet.2022.11.013

DO - 10.1016/j.neunet.2022.11.013

M3 - 文章

C2 - 36463693

AN - SCOPUS:85143538089

SN - 0893-6080

VL - 158

SP - 216

EP - 227

JO - Neural Networks

JF - Neural Networks

ER -

Neural speech enhancement with unsupervised pre-training and mixture training

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this