Optimizing the Perceptual Quality of Time-Domain Speech Enhancement with Reinforcement Learning

Xiang Hao, Chenglin Xu, Lei Xie, Haizhou Li

科研成果: 期刊稿件文章同行评审

8 引用 (Scopus)

摘要

In neural speech enhancement, a mismatch exists between the training objective, i.e., Mean-Square Error (MSE), and perceptual quality evaluation metrics, i.e., perceptual evaluation of speech quality and short-time objective intelligibility. We propose a novel reinforcement learning algorithm and network architecture, which incorporate a non-differentiable perceptual quality evaluation metric into the objective function using a dynamic filter module. Unlike the traditional dynamic filter implementation that directly generates a convolution kernel, we use a filter generation agent to predict the probability density function of a multivariate Gaussian distribution, from which we sample the convolution kernel. Experimental results show that the proposed reinforcement learning method clearly improves the perceptual quality over other supervised learning methods with the MSE objective function.

源语言英语
页(从-至)939-947
页数9
期刊Tsinghua Science and Technology
27
6
DOI
出版状态已出版 - 1 12月 2022

指纹

探究 'Optimizing the Perceptual Quality of Time-Domain Speech Enhancement with Reinforcement Learning' 的科研主题。它们共同构成独一无二的指纹。

引用此