Time-domain neural network approach for speech bandwidth extension

Xiang Hao; Chenglin Xu; Nana Hou; Lei Xie; Eng Siong Chng; Haizhou Li

doi:10.1109/ICASSP40776.2020.9054551

Time-domain neural network approach for speech bandwidth extension

Xiang Hao, Chenglin Xu, Nana Hou, Lei Xie, Eng Siong Chng, Haizhou Li

School of Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

12 Scopus citations

Abstract

In this paper, we study the time-domain neural network approach for speech bandwidth extension. We propose a network architecture, named multi-scale fusion neural network (MfNet), that gradually restores the low-frequency signal and predicts the high-frequency signal through the exchange of information across different scale representations. We propose a training scheme to optimize the network with a combination of perceptual loss and time-domain adversarial loss. Experiments show the proposed multi-scale fusion network consistently outperforms the competing methods in terms of perceptual evaluation of speech quality (PESQ), signal to distortion rate (SDR), signal to noise ratio (SNR), log-spectral distance (LSD) and word error rate (WER). More promisingly, the multi-scale fusion network requires only 10% of the parameters of the time-domain reference baseline.

Original language	English
Title of host publication	2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	866-870
Number of pages	5
ISBN (Electronic)	9781509066315
DOIs	https://doi.org/10.1109/ICASSP40776.2020.9054551
State	Published - May 2020
Event	2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Barcelona, Spain Duration: 4 May 2020 → 8 May 2020

Publication series

Name	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume	2020-May
ISSN (Print)	1520-6149

Conference

Conference	2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020
Country/Territory	Spain
City	Barcelona
Period	4/05/20 → 8/05/20

Keywords

Deep learning
Multi-scale fusion
Neural networks
Speech bandwidth extension

Access to Document

10.1109/ICASSP40776.2020.9054551

Cite this

Hao, X., Xu, C., Hou, N., Xie, L., Chng, E. S., & Li, H. (2020). Time-domain neural network approach for speech bandwidth extension. In 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings (pp. 866-870). Article 9054551 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2020-May). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP40776.2020.9054551

Hao, Xiang ; Xu, Chenglin ; Hou, Nana et al. / Time-domain neural network approach for speech bandwidth extension. 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2020. pp. 866-870 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

@inproceedings{835133d71fd44a8b9d1f88524d10e74e,

title = "Time-domain neural network approach for speech bandwidth extension",

abstract = "In this paper, we study the time-domain neural network approach for speech bandwidth extension. We propose a network architecture, named multi-scale fusion neural network (MfNet), that gradually restores the low-frequency signal and predicts the high-frequency signal through the exchange of information across different scale representations. We propose a training scheme to optimize the network with a combination of perceptual loss and time-domain adversarial loss. Experiments show the proposed multi-scale fusion network consistently outperforms the competing methods in terms of perceptual evaluation of speech quality (PESQ), signal to distortion rate (SDR), signal to noise ratio (SNR), log-spectral distance (LSD) and word error rate (WER). More promisingly, the multi-scale fusion network requires only 10% of the parameters of the time-domain reference baseline.",

keywords = "Deep learning, Multi-scale fusion, Neural networks, Speech bandwidth extension",

author = "Xiang Hao and Chenglin Xu and Nana Hou and Lei Xie and Chng, {Eng Siong} and Haizhou Li",

note = "Publisher Copyright: {\textcopyright} 2020 IEEE; 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 ; Conference date: 04-05-2020 Through 08-05-2020",

year = "2020",

month = may,

doi = "10.1109/ICASSP40776.2020.9054551",

language = "英语",

series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "866--870",

booktitle = "2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings",

}

Hao, X, Xu, C, Hou, N, Xie, L, Chng, ES & Li, H 2020, Time-domain neural network approach for speech bandwidth extension. in 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings., 9054551, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 2020-May, Institute of Electrical and Electronics Engineers Inc., pp. 866-870, 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020, Barcelona, Spain, 4/05/20. https://doi.org/10.1109/ICASSP40776.2020.9054551

Time-domain neural network approach for speech bandwidth extension. / Hao, Xiang; Xu, Chenglin; Hou, Nana et al.
2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2020. p. 866-870 9054551 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2020-May).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Time-domain neural network approach for speech bandwidth extension

AU - Hao, Xiang

AU - Xu, Chenglin

AU - Hou, Nana

AU - Xie, Lei

AU - Chng, Eng Siong

AU - Li, Haizhou

PY - 2020/5

Y1 - 2020/5

N2 - In this paper, we study the time-domain neural network approach for speech bandwidth extension. We propose a network architecture, named multi-scale fusion neural network (MfNet), that gradually restores the low-frequency signal and predicts the high-frequency signal through the exchange of information across different scale representations. We propose a training scheme to optimize the network with a combination of perceptual loss and time-domain adversarial loss. Experiments show the proposed multi-scale fusion network consistently outperforms the competing methods in terms of perceptual evaluation of speech quality (PESQ), signal to distortion rate (SDR), signal to noise ratio (SNR), log-spectral distance (LSD) and word error rate (WER). More promisingly, the multi-scale fusion network requires only 10% of the parameters of the time-domain reference baseline.

AB - In this paper, we study the time-domain neural network approach for speech bandwidth extension. We propose a network architecture, named multi-scale fusion neural network (MfNet), that gradually restores the low-frequency signal and predicts the high-frequency signal through the exchange of information across different scale representations. We propose a training scheme to optimize the network with a combination of perceptual loss and time-domain adversarial loss. Experiments show the proposed multi-scale fusion network consistently outperforms the competing methods in terms of perceptual evaluation of speech quality (PESQ), signal to distortion rate (SDR), signal to noise ratio (SNR), log-spectral distance (LSD) and word error rate (WER). More promisingly, the multi-scale fusion network requires only 10% of the parameters of the time-domain reference baseline.

KW - Deep learning

KW - Multi-scale fusion

KW - Neural networks

KW - Speech bandwidth extension

UR - http://www.scopus.com/inward/record.url?scp=85091153021&partnerID=8YFLogxK

U2 - 10.1109/ICASSP40776.2020.9054551

DO - 10.1109/ICASSP40776.2020.9054551

M3 - 会议稿件

AN - SCOPUS:85091153021

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

SP - 866

EP - 870

BT - 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020

Y2 - 4 May 2020 through 8 May 2020

ER -

Hao X, Xu C, Hou N, Xie L, Chng ES, Li H. Time-domain neural network approach for speech bandwidth extension. In 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2020. p. 866-870. 9054551. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). doi: 10.1109/ICASSP40776.2020.9054551

Time-domain neural network approach for speech bandwidth extension

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this