Dual-microphone based binary mask estimation for robust speaker verification

Yali Zhao, Zhong Hua Fu, Lei Xie, Jian Zhang, Yanning Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Missing feature theory (MFT) has shown great potential for robust speaker recognition in noisy environments. Accurate estimation of binary mask is crucial in MFT-based speaker recognition. This paper addresses the speaker verification problem using MFT in a practical scenario: the location of target speaker is fixed while the locations of noise interferences are unknown. Specifically, we propose a dual-microphone semi-blind approach to estimate the binary mask. During system initialization, a spatial location model for the target is trained precisely. Then a spatial model for corrupted speech is obtained on-line by model adaptation. Finally, the binary mask is estimated by likelihood comparison. Moreover, we propose a reliable frame selection method to further focus on the reliable speech frames for missing data speaker recognition. Experimental results demonstrate that our proposed approach achieves substantial improvements in recognition performance in both white noise and speech corrupted conditions.

Original languageEnglish
Title of host publicationICALIP 2012 - 2012 International Conference on Audio, Language and Image Processing, Proceedings
Pages1014-1019
Number of pages6
DOIs
StatePublished - 2012
Event2012 3rd IEEE/IET International Conference on Audio, Language and Image Processing, ICALIP 2012 - Shanghai, China
Duration: 16 Jul 201218 Jul 2012

Publication series

NameICALIP 2012 - 2012 International Conference on Audio, Language and Image Processing, Proceedings

Conference

Conference2012 3rd IEEE/IET International Conference on Audio, Language and Image Processing, ICALIP 2012
Country/TerritoryChina
CityShanghai
Period16/07/1218/07/12

Fingerprint

Dive into the research topics of 'Dual-microphone based binary mask estimation for robust speaker verification'. Together they form a unique fingerprint.

Cite this