A two stage mask estimation approach to robust speaker verification

Yali Zhao, Lei Xie, Zhonghua Fu

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

We propose a two-stage mask estimation approach to robust speaker verification (SV) in noise environments. We consider a practical semi-blind SV scenario: the location of the target speaker is fixed while the locations of all interferers are unknown. In the first stage, we use a dual-microphone and a semi-blind degenerate unmixing estimation technique (DUET) to estimate an initial binary mask. In the second stage, we refine the mask based on the time and frequency histograms of the initial mask. As a result, only highly reliable time-frequency components in the spectro-temporal features are kept for downstream verification. Experiments show that the proposed approach is superior to a baseline MFCC approach and a recent local SNR based mask estimation approach.

源语言英语
主期刊名13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
2653-2656
页数4
出版状态已出版 - 2012
活动13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012 - Portland, OR, 美国
期限: 9 9月 201213 9月 2012

出版系列

姓名13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
3

会议

会议13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
国家/地区美国
Portland, OR
时期9/09/1213/09/12

指纹

探究 'A two stage mask estimation approach to robust speaker verification' 的科研主题。它们共同构成独一无二的指纹。

引用此