TY - JOUR
T1 - Optimizing the Perceptual Quality of Time-Domain Speech Enhancement with Reinforcement Learning
AU - Hao, Xiang
AU - Xu, Chenglin
AU - Xie, Lei
AU - Li, Haizhou
N1 - Publisher Copyright:
© 1996-2012 Tsinghua University Press.
PY - 2022/12/1
Y1 - 2022/12/1
N2 - In neural speech enhancement, a mismatch exists between the training objective, i.e., Mean-Square Error (MSE), and perceptual quality evaluation metrics, i.e., perceptual evaluation of speech quality and short-time objective intelligibility. We propose a novel reinforcement learning algorithm and network architecture, which incorporate a non-differentiable perceptual quality evaluation metric into the objective function using a dynamic filter module. Unlike the traditional dynamic filter implementation that directly generates a convolution kernel, we use a filter generation agent to predict the probability density function of a multivariate Gaussian distribution, from which we sample the convolution kernel. Experimental results show that the proposed reinforcement learning method clearly improves the perceptual quality over other supervised learning methods with the MSE objective function.
AB - In neural speech enhancement, a mismatch exists between the training objective, i.e., Mean-Square Error (MSE), and perceptual quality evaluation metrics, i.e., perceptual evaluation of speech quality and short-time objective intelligibility. We propose a novel reinforcement learning algorithm and network architecture, which incorporate a non-differentiable perceptual quality evaluation metric into the objective function using a dynamic filter module. Unlike the traditional dynamic filter implementation that directly generates a convolution kernel, we use a filter generation agent to predict the probability density function of a multivariate Gaussian distribution, from which we sample the convolution kernel. Experimental results show that the proposed reinforcement learning method clearly improves the perceptual quality over other supervised learning methods with the MSE objective function.
KW - dynamic filter
KW - neural networks
KW - reinforcement learning
KW - speech enhancement
UR - http://www.scopus.com/inward/record.url?scp=85133706189&partnerID=8YFLogxK
U2 - 10.26599/TST.2021.9010048
DO - 10.26599/TST.2021.9010048
M3 - 文章
AN - SCOPUS:85133706189
SN - 1007-0214
VL - 27
SP - 939
EP - 947
JO - Tsinghua Science and Technology
JF - Tsinghua Science and Technology
IS - 6
ER -