Incorporation of a modified temporal cepstrum smoothing in both signal-to-noise ratio and speech presence probability estimation for speech enhancement

Dahan Wang; Zhongshu Hou; Yuxiang Hu; Changbao Zhu; Jing Lu; Jingdong Chen

doi:10.1121/10.0026223

Incorporation of a modified temporal cepstrum smoothing in both signal-to-noise ratio and speech presence probability estimation for speech enhancement

Dahan Wang, Zhongshu Hou, Yuxiang Hu, Changbao Zhu, Jing Lu, Jingdong Chen

航海学院

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

Numerous advanced and lightweight signal processing methods have been presented for single-channel speech enhancement (SE). It is imperative to carefully explore how to efficiently combine, integrate, and balance these methods. This paper proposes a more effective and less resource-intensive SE system, focused on the integration and adaptation of several approaches, especially the temporal cepstrum smoothing (TCS). First, a more robust fundamental frequency estimator is employed within TCS, mitigating the performance limitations caused by the inaccuracy of the original estimator. Additionally, a harmonic enhancement mechanism is introduced, effectively recovering the weak harmonic components. By incorporation of the modified TCS in the a posteriori speech presence probability estimation, the unbiased minimum mean square error noise power spectral density estimator can be refined. The modified TCS is also utilized for the a priori signal-to-noise ratio estimation. Moreover, this paper enhances the log-spectral amplitude estimator by applying both super-Gaussian speech priors and speech presence uncertainty for further improvement. Experimental evaluations demonstrate that the proposed method yields an improvement in speech quality while maintaining modest computational and storage requirements. Furthermore, the proposed system exhibits comparable performance to several baseline systems based on lightweight deep neural networks.

源语言	英语
页（从-至）	3678-3689
页数	12
期刊	Journal of the Acoustical Society of America
卷	155
期	6
DOI	https://doi.org/10.1121/10.0026223
出版状态	已出版 - 1 6月 2024

访问文件

10.1121/10.0026223

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{0fbcfafe44344e489b1ba472a1d7b941,

title = "Incorporation of a modified temporal cepstrum smoothing in both signal-to-noise ratio and speech presence probability estimation for speech enhancement",

abstract = "Numerous advanced and lightweight signal processing methods have been presented for single-channel speech enhancement (SE). It is imperative to carefully explore how to efficiently combine, integrate, and balance these methods. This paper proposes a more effective and less resource-intensive SE system, focused on the integration and adaptation of several approaches, especially the temporal cepstrum smoothing (TCS). First, a more robust fundamental frequency estimator is employed within TCS, mitigating the performance limitations caused by the inaccuracy of the original estimator. Additionally, a harmonic enhancement mechanism is introduced, effectively recovering the weak harmonic components. By incorporation of the modified TCS in the a posteriori speech presence probability estimation, the unbiased minimum mean square error noise power spectral density estimator can be refined. The modified TCS is also utilized for the a priori signal-to-noise ratio estimation. Moreover, this paper enhances the log-spectral amplitude estimator by applying both super-Gaussian speech priors and speech presence uncertainty for further improvement. Experimental evaluations demonstrate that the proposed method yields an improvement in speech quality while maintaining modest computational and storage requirements. Furthermore, the proposed system exhibits comparable performance to several baseline systems based on lightweight deep neural networks.",

author = "Dahan Wang and Zhongshu Hou and Yuxiang Hu and Changbao Zhu and Jing Lu and Jingdong Chen",

note = "Publisher Copyright: {\textcopyright} 2024 Acoustical Society of America.",

year = "2024",

month = jun,

day = "1",

doi = "10.1121/10.0026223",

language = "英语",

volume = "155",

pages = "3678--3689",

journal = "Journal of the Acoustical Society of America",

issn = "0001-4966",

publisher = "Acoustical Society of America",

number = "6",

}

TY - JOUR

T1 - Incorporation of a modified temporal cepstrum smoothing in both signal-to-noise ratio and speech presence probability estimation for speech enhancement

AU - Wang, Dahan

AU - Hou, Zhongshu

AU - Hu, Yuxiang

AU - Zhu, Changbao

AU - Lu, Jing

AU - Chen, Jingdong

PY - 2024/6/1

Y1 - 2024/6/1

N2 - Numerous advanced and lightweight signal processing methods have been presented for single-channel speech enhancement (SE). It is imperative to carefully explore how to efficiently combine, integrate, and balance these methods. This paper proposes a more effective and less resource-intensive SE system, focused on the integration and adaptation of several approaches, especially the temporal cepstrum smoothing (TCS). First, a more robust fundamental frequency estimator is employed within TCS, mitigating the performance limitations caused by the inaccuracy of the original estimator. Additionally, a harmonic enhancement mechanism is introduced, effectively recovering the weak harmonic components. By incorporation of the modified TCS in the a posteriori speech presence probability estimation, the unbiased minimum mean square error noise power spectral density estimator can be refined. The modified TCS is also utilized for the a priori signal-to-noise ratio estimation. Moreover, this paper enhances the log-spectral amplitude estimator by applying both super-Gaussian speech priors and speech presence uncertainty for further improvement. Experimental evaluations demonstrate that the proposed method yields an improvement in speech quality while maintaining modest computational and storage requirements. Furthermore, the proposed system exhibits comparable performance to several baseline systems based on lightweight deep neural networks.

AB - Numerous advanced and lightweight signal processing methods have been presented for single-channel speech enhancement (SE). It is imperative to carefully explore how to efficiently combine, integrate, and balance these methods. This paper proposes a more effective and less resource-intensive SE system, focused on the integration and adaptation of several approaches, especially the temporal cepstrum smoothing (TCS). First, a more robust fundamental frequency estimator is employed within TCS, mitigating the performance limitations caused by the inaccuracy of the original estimator. Additionally, a harmonic enhancement mechanism is introduced, effectively recovering the weak harmonic components. By incorporation of the modified TCS in the a posteriori speech presence probability estimation, the unbiased minimum mean square error noise power spectral density estimator can be refined. The modified TCS is also utilized for the a priori signal-to-noise ratio estimation. Moreover, this paper enhances the log-spectral amplitude estimator by applying both super-Gaussian speech priors and speech presence uncertainty for further improvement. Experimental evaluations demonstrate that the proposed method yields an improvement in speech quality while maintaining modest computational and storage requirements. Furthermore, the proposed system exhibits comparable performance to several baseline systems based on lightweight deep neural networks.

UR - http://www.scopus.com/inward/record.url?scp=85195439173&partnerID=8YFLogxK

U2 - 10.1121/10.0026223

DO - 10.1121/10.0026223

M3 - 文章

C2 - 38847592

AN - SCOPUS:85195439173

SN - 0001-4966

VL - 155

SP - 3678

EP - 3689

JO - Journal of the Acoustical Society of America

JF - Journal of the Acoustical Society of America

IS - 6

ER -

Incorporation of a modified temporal cepstrum smoothing in both signal-to-noise ratio and speech presence probability estimation for speech enhancement

摘要

访问文件

其它文件与链接

指纹

引用此