Time-domain neural network approach for speech bandwidth extension

Xiang Hao, Chenglin Xu, Nana Hou, Lei Xie, Eng Siong Chng, Haizhou Li

科研成果: 书/报告/会议事项章节会议稿件同行评审

12 引用 (Scopus)

摘要

In this paper, we study the time-domain neural network approach for speech bandwidth extension. We propose a network architecture, named multi-scale fusion neural network (MfNet), that gradually restores the low-frequency signal and predicts the high-frequency signal through the exchange of information across different scale representations. We propose a training scheme to optimize the network with a combination of perceptual loss and time-domain adversarial loss. Experiments show the proposed multi-scale fusion network consistently outperforms the competing methods in terms of perceptual evaluation of speech quality (PESQ), signal to distortion rate (SDR), signal to noise ratio (SNR), log-spectral distance (LSD) and word error rate (WER). More promisingly, the multi-scale fusion network requires only 10% of the parameters of the time-domain reference baseline.

源语言英语
主期刊名2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings
出版商Institute of Electrical and Electronics Engineers Inc.
866-870
页数5
ISBN(电子版)9781509066315
DOI
出版状态已出版 - 5月 2020
活动2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Barcelona, 西班牙
期限: 4 5月 20208 5月 2020

出版系列

姓名ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
2020-May
ISSN(印刷版)1520-6149

会议

会议2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020
国家/地区西班牙
Barcelona
时期4/05/208/05/20

指纹

探究 'Time-domain neural network approach for speech bandwidth extension' 的科研主题。它们共同构成独一无二的指纹。

引用此