Time-domain neural network approach for speech bandwidth extension

Xiang Hao; Chenglin Xu; Nana Hou; Lei Xie; Eng Siong Chng; Haizhou Li

doi:10.1109/ICASSP40776.2020.9054551

Time-domain neural network approach for speech bandwidth extension

Xiang Hao, Chenglin Xu, Nana Hou, Lei Xie, Eng Siong Chng, Haizhou Li

计算机学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

12 引用（Scopus）

摘要

In this paper, we study the time-domain neural network approach for speech bandwidth extension. We propose a network architecture, named multi-scale fusion neural network (MfNet), that gradually restores the low-frequency signal and predicts the high-frequency signal through the exchange of information across different scale representations. We propose a training scheme to optimize the network with a combination of perceptual loss and time-domain adversarial loss. Experiments show the proposed multi-scale fusion network consistently outperforms the competing methods in terms of perceptual evaluation of speech quality (PESQ), signal to distortion rate (SDR), signal to noise ratio (SNR), log-spectral distance (LSD) and word error rate (WER). More promisingly, the multi-scale fusion network requires only 10% of the parameters of the time-domain reference baseline.

源语言	英语
主期刊名	2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings
出版商	Institute of Electrical and Electronics Engineers Inc.
页	866-870
页数	5
ISBN（电子版）	9781509066315
DOI	https://doi.org/10.1109/ICASSP40776.2020.9054551
出版状态	已出版 - 5月 2020
活动	2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Barcelona, 西班牙期限: 4 5月 2020 → 8 5月 2020

出版系列

姓名	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
卷	2020-May
ISSN（印刷版）	1520-6149

会议

会议	2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020
国家/地区	西班牙
市	Barcelona
时期	4/05/20 → 8/05/20

访问文件

10.1109/ICASSP40776.2020.9054551

其它文件与链接

链接到 Scopus 的出版物

引用此

Hao, X., Xu, C., Hou, N., Xie, L., Chng, E. S., & Li, H. (2020). Time-domain neural network approach for speech bandwidth extension. 在 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings (页码 866-870). 文章 9054551 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; 卷 2020-May). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP40776.2020.9054551

Hao, Xiang ; Xu, Chenglin ; Hou, Nana 等. / Time-domain neural network approach for speech bandwidth extension. 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2020. 页码 866-870 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

@inproceedings{835133d71fd44a8b9d1f88524d10e74e,

title = "Time-domain neural network approach for speech bandwidth extension",

abstract = "In this paper, we study the time-domain neural network approach for speech bandwidth extension. We propose a network architecture, named multi-scale fusion neural network (MfNet), that gradually restores the low-frequency signal and predicts the high-frequency signal through the exchange of information across different scale representations. We propose a training scheme to optimize the network with a combination of perceptual loss and time-domain adversarial loss. Experiments show the proposed multi-scale fusion network consistently outperforms the competing methods in terms of perceptual evaluation of speech quality (PESQ), signal to distortion rate (SDR), signal to noise ratio (SNR), log-spectral distance (LSD) and word error rate (WER). More promisingly, the multi-scale fusion network requires only 10% of the parameters of the time-domain reference baseline.",

keywords = "Deep learning, Multi-scale fusion, Neural networks, Speech bandwidth extension",

author = "Xiang Hao and Chenglin Xu and Nana Hou and Lei Xie and Chng, {Eng Siong} and Haizhou Li",

note = "Publisher Copyright: {\textcopyright} 2020 IEEE; 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 ; Conference date: 04-05-2020 Through 08-05-2020",

year = "2020",

month = may,

doi = "10.1109/ICASSP40776.2020.9054551",

language = "英语",

series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "866--870",

booktitle = "2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings",

}

Hao, X, Xu, C, Hou, N, Xie, L, Chng, ES & Li, H 2020, Time-domain neural network approach for speech bandwidth extension. 在 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings., 9054551, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 卷 2020-May, Institute of Electrical and Electronics Engineers Inc., 页码 866-870, 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020, Barcelona, 西班牙, 4/05/20. https://doi.org/10.1109/ICASSP40776.2020.9054551

Time-domain neural network approach for speech bandwidth extension. / Hao, Xiang; Xu, Chenglin; Hou, Nana 等.
2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2020. 页码 866-870 9054551 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; 卷 2020-May).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Time-domain neural network approach for speech bandwidth extension

AU - Hao, Xiang

AU - Xu, Chenglin

AU - Hou, Nana

AU - Xie, Lei

AU - Chng, Eng Siong

AU - Li, Haizhou

PY - 2020/5

Y1 - 2020/5

N2 - In this paper, we study the time-domain neural network approach for speech bandwidth extension. We propose a network architecture, named multi-scale fusion neural network (MfNet), that gradually restores the low-frequency signal and predicts the high-frequency signal through the exchange of information across different scale representations. We propose a training scheme to optimize the network with a combination of perceptual loss and time-domain adversarial loss. Experiments show the proposed multi-scale fusion network consistently outperforms the competing methods in terms of perceptual evaluation of speech quality (PESQ), signal to distortion rate (SDR), signal to noise ratio (SNR), log-spectral distance (LSD) and word error rate (WER). More promisingly, the multi-scale fusion network requires only 10% of the parameters of the time-domain reference baseline.

AB - In this paper, we study the time-domain neural network approach for speech bandwidth extension. We propose a network architecture, named multi-scale fusion neural network (MfNet), that gradually restores the low-frequency signal and predicts the high-frequency signal through the exchange of information across different scale representations. We propose a training scheme to optimize the network with a combination of perceptual loss and time-domain adversarial loss. Experiments show the proposed multi-scale fusion network consistently outperforms the competing methods in terms of perceptual evaluation of speech quality (PESQ), signal to distortion rate (SDR), signal to noise ratio (SNR), log-spectral distance (LSD) and word error rate (WER). More promisingly, the multi-scale fusion network requires only 10% of the parameters of the time-domain reference baseline.

KW - Deep learning

KW - Multi-scale fusion

KW - Neural networks

KW - Speech bandwidth extension

UR - http://www.scopus.com/inward/record.url?scp=85091153021&partnerID=8YFLogxK

U2 - 10.1109/ICASSP40776.2020.9054551

DO - 10.1109/ICASSP40776.2020.9054551

M3 - 会议稿件

AN - SCOPUS:85091153021

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

SP - 866

EP - 870

BT - 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020

Y2 - 4 May 2020 through 8 May 2020

ER -

Hao X, Xu C, Hou N, Xie L, Chng ES, Li H. Time-domain neural network approach for speech bandwidth extension. 在 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2020. 页码 866-870. 9054551. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). doi: 10.1109/ICASSP40776.2020.9054551

Time-domain neural network approach for speech bandwidth extension

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此