Plug-and-Play MVDR Beamforming for Speech Separation

Chengbo Chang, Ziye Yang, Jie Chen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

As an adaptive beamformer, the Minimum Variance Distortionless Response (MVDR) method has proven its efficiency in separating target speech from background noise and interference. Conventionally, MVDR relies on physical information regarding signal angles and covariance matrices, however, ignores that the beamformer output can potentially benefit from the prior structures of speech signals. Motivated by the recent advance in integrating physics-based and data-driven approaches, this paper introduces a novel speech separation framework. Our approach enhances MVDR by incorporating Plug-and-Play (PnP) techniques to capture speech priors, specifically employing the Regularization by Denoising (RED) method to integrate prior speech information obtained from data into the optimization process. Experimental results validate the effectiveness of the proposed approach.

Original languageEnglish
Title of host publication2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1346-1350
Number of pages5
ISBN (Electronic)9798350344851
DOIs
StatePublished - 2024
Event2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Seoul, Korea, Republic of
Duration: 14 Apr 202419 Apr 2024

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024
Country/TerritoryKorea, Republic of
CitySeoul
Period14/04/2419/04/24

Keywords

  • MVDR beamforming
  • PnP strategy
  • Speech separation
  • deep speech priors

Fingerprint

Dive into the research topics of 'Plug-and-Play MVDR Beamforming for Speech Separation'. Together they form a unique fingerprint.

Cite this