A neural network approach for speech enhancement and noise-robust bandwidth extension

Xiang Hao; Chenglin Xu; Chen Zhang; Lei Xie

doi:10.1016/j.csl.2024.101709

A neural network approach for speech enhancement and noise-robust bandwidth extension

Xiang Hao, Chenglin Xu, Chen Zhang, Lei Xie

School of Computer Science

Research output: Contribution to journal › Article › peer-review

Abstract

When processing noisy utterances with varying frequency bandwidths using an enhancement model, the effective bandwidth of the resulting enhanced speech often remains unchanged. However, high-frequency components are crucial for perceived audio quality, underscoring the need for noise-robust bandwidth extension capabilities in speech enhancement networks. In this study, we addressed this challenge by proposing a novel network architecture and loss function based on the CAUNet, which is a state-of-the-art speech enhancement method. We introduced a multi-scale loss and implemented a coordinate embedded upsampling block to facilitate bandwidth extension while maintaining the ability of speech enhancement. Additionally, we proposed a gradient loss function to promote the neural network's convergence, leading to significant performance improvements. Our experimental results validate these modifications and clearly demonstrate the superiority of our approach over competing methods.

Original language	English
Article number	101709
Journal	Computer Speech and Language
Volume	89
DOIs	https://doi.org/10.1016/j.csl.2024.101709
State	Published - Jan 2025

Keywords

Neural networks
Speech bandwidth extension
Speech enhancement

Access to Document

10.1016/j.csl.2024.101709

Cite this

@article{a08f00c36af548b399243c7fa9416258,

title = "A neural network approach for speech enhancement and noise-robust bandwidth extension",

abstract = "When processing noisy utterances with varying frequency bandwidths using an enhancement model, the effective bandwidth of the resulting enhanced speech often remains unchanged. However, high-frequency components are crucial for perceived audio quality, underscoring the need for noise-robust bandwidth extension capabilities in speech enhancement networks. In this study, we addressed this challenge by proposing a novel network architecture and loss function based on the CAUNet, which is a state-of-the-art speech enhancement method. We introduced a multi-scale loss and implemented a coordinate embedded upsampling block to facilitate bandwidth extension while maintaining the ability of speech enhancement. Additionally, we proposed a gradient loss function to promote the neural network's convergence, leading to significant performance improvements. Our experimental results validate these modifications and clearly demonstrate the superiority of our approach over competing methods.",

keywords = "Neural networks, Speech bandwidth extension, Speech enhancement",

author = "Xiang Hao and Chenglin Xu and Chen Zhang and Lei Xie",

note = "Publisher Copyright: {\textcopyright} 2024 Elsevier Ltd",

year = "2025",

month = jan,

doi = "10.1016/j.csl.2024.101709",

language = "英语",

volume = "89",

journal = "Computer Speech and Language",

issn = "0885-2308",

publisher = "Academic Press",

}

TY - JOUR

T1 - A neural network approach for speech enhancement and noise-robust bandwidth extension

AU - Hao, Xiang

AU - Xu, Chenglin

AU - Zhang, Chen

AU - Xie, Lei

PY - 2025/1

Y1 - 2025/1

N2 - When processing noisy utterances with varying frequency bandwidths using an enhancement model, the effective bandwidth of the resulting enhanced speech often remains unchanged. However, high-frequency components are crucial for perceived audio quality, underscoring the need for noise-robust bandwidth extension capabilities in speech enhancement networks. In this study, we addressed this challenge by proposing a novel network architecture and loss function based on the CAUNet, which is a state-of-the-art speech enhancement method. We introduced a multi-scale loss and implemented a coordinate embedded upsampling block to facilitate bandwidth extension while maintaining the ability of speech enhancement. Additionally, we proposed a gradient loss function to promote the neural network's convergence, leading to significant performance improvements. Our experimental results validate these modifications and clearly demonstrate the superiority of our approach over competing methods.

AB - When processing noisy utterances with varying frequency bandwidths using an enhancement model, the effective bandwidth of the resulting enhanced speech often remains unchanged. However, high-frequency components are crucial for perceived audio quality, underscoring the need for noise-robust bandwidth extension capabilities in speech enhancement networks. In this study, we addressed this challenge by proposing a novel network architecture and loss function based on the CAUNet, which is a state-of-the-art speech enhancement method. We introduced a multi-scale loss and implemented a coordinate embedded upsampling block to facilitate bandwidth extension while maintaining the ability of speech enhancement. Additionally, we proposed a gradient loss function to promote the neural network's convergence, leading to significant performance improvements. Our experimental results validate these modifications and clearly demonstrate the superiority of our approach over competing methods.

KW - Neural networks

KW - Speech bandwidth extension

KW - Speech enhancement

UR - http://www.scopus.com/inward/record.url?scp=85201257800&partnerID=8YFLogxK

U2 - 10.1016/j.csl.2024.101709

DO - 10.1016/j.csl.2024.101709

M3 - 文章

AN - SCOPUS:85201257800

SN - 0885-2308

VL - 89

JO - Computer Speech and Language

JF - Computer Speech and Language

M1 - 101709

ER -

A neural network approach for speech enhancement and noise-robust bandwidth extension

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this