Optimizing the Perceptual Quality of Time-Domain Speech Enhancement with Reinforcement Learning

Xiang Hao, Chenglin Xu, Lei Xie, Haizhou Li

Research output: Contribution to journalArticlepeer-review

8 Scopus citations

Abstract

In neural speech enhancement, a mismatch exists between the training objective, i.e., Mean-Square Error (MSE), and perceptual quality evaluation metrics, i.e., perceptual evaluation of speech quality and short-time objective intelligibility. We propose a novel reinforcement learning algorithm and network architecture, which incorporate a non-differentiable perceptual quality evaluation metric into the objective function using a dynamic filter module. Unlike the traditional dynamic filter implementation that directly generates a convolution kernel, we use a filter generation agent to predict the probability density function of a multivariate Gaussian distribution, from which we sample the convolution kernel. Experimental results show that the proposed reinforcement learning method clearly improves the perceptual quality over other supervised learning methods with the MSE objective function.

Original languageEnglish
Pages (from-to)939-947
Number of pages9
JournalTsinghua Science and Technology
Volume27
Issue number6
DOIs
StatePublished - 1 Dec 2022

Keywords

  • dynamic filter
  • neural networks
  • reinforcement learning
  • speech enhancement

Fingerprint

Dive into the research topics of 'Optimizing the Perceptual Quality of Time-Domain Speech Enhancement with Reinforcement Learning'. Together they form a unique fingerprint.

Cite this