TY - GEN
T1 - Time-domain neural network approach for speech bandwidth extension
AU - Hao, Xiang
AU - Xu, Chenglin
AU - Hou, Nana
AU - Xie, Lei
AU - Chng, Eng Siong
AU - Li, Haizhou
N1 - Publisher Copyright:
© 2020 IEEE
PY - 2020/5
Y1 - 2020/5
N2 - In this paper, we study the time-domain neural network approach for speech bandwidth extension. We propose a network architecture, named multi-scale fusion neural network (MfNet), that gradually restores the low-frequency signal and predicts the high-frequency signal through the exchange of information across different scale representations. We propose a training scheme to optimize the network with a combination of perceptual loss and time-domain adversarial loss. Experiments show the proposed multi-scale fusion network consistently outperforms the competing methods in terms of perceptual evaluation of speech quality (PESQ), signal to distortion rate (SDR), signal to noise ratio (SNR), log-spectral distance (LSD) and word error rate (WER). More promisingly, the multi-scale fusion network requires only 10% of the parameters of the time-domain reference baseline.
AB - In this paper, we study the time-domain neural network approach for speech bandwidth extension. We propose a network architecture, named multi-scale fusion neural network (MfNet), that gradually restores the low-frequency signal and predicts the high-frequency signal through the exchange of information across different scale representations. We propose a training scheme to optimize the network with a combination of perceptual loss and time-domain adversarial loss. Experiments show the proposed multi-scale fusion network consistently outperforms the competing methods in terms of perceptual evaluation of speech quality (PESQ), signal to distortion rate (SDR), signal to noise ratio (SNR), log-spectral distance (LSD) and word error rate (WER). More promisingly, the multi-scale fusion network requires only 10% of the parameters of the time-domain reference baseline.
KW - Deep learning
KW - Multi-scale fusion
KW - Neural networks
KW - Speech bandwidth extension
UR - http://www.scopus.com/inward/record.url?scp=85091153021&partnerID=8YFLogxK
U2 - 10.1109/ICASSP40776.2020.9054551
DO - 10.1109/ICASSP40776.2020.9054551
M3 - 会议稿件
AN - SCOPUS:85091153021
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 866
EP - 870
BT - 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020
Y2 - 4 May 2020 through 8 May 2020
ER -