TY - GEN
T1 - Dual-microphone based binary mask estimation for robust speaker verification
AU - Zhao, Yali
AU - Fu, Zhong Hua
AU - Xie, Lei
AU - Zhang, Jian
AU - Zhang, Yanning
PY - 2012
Y1 - 2012
N2 - Missing feature theory (MFT) has shown great potential for robust speaker recognition in noisy environments. Accurate estimation of binary mask is crucial in MFT-based speaker recognition. This paper addresses the speaker verification problem using MFT in a practical scenario: the location of target speaker is fixed while the locations of noise interferences are unknown. Specifically, we propose a dual-microphone semi-blind approach to estimate the binary mask. During system initialization, a spatial location model for the target is trained precisely. Then a spatial model for corrupted speech is obtained on-line by model adaptation. Finally, the binary mask is estimated by likelihood comparison. Moreover, we propose a reliable frame selection method to further focus on the reliable speech frames for missing data speaker recognition. Experimental results demonstrate that our proposed approach achieves substantial improvements in recognition performance in both white noise and speech corrupted conditions.
AB - Missing feature theory (MFT) has shown great potential for robust speaker recognition in noisy environments. Accurate estimation of binary mask is crucial in MFT-based speaker recognition. This paper addresses the speaker verification problem using MFT in a practical scenario: the location of target speaker is fixed while the locations of noise interferences are unknown. Specifically, we propose a dual-microphone semi-blind approach to estimate the binary mask. During system initialization, a spatial location model for the target is trained precisely. Then a spatial model for corrupted speech is obtained on-line by model adaptation. Finally, the binary mask is estimated by likelihood comparison. Moreover, we propose a reliable frame selection method to further focus on the reliable speech frames for missing data speaker recognition. Experimental results demonstrate that our proposed approach achieves substantial improvements in recognition performance in both white noise and speech corrupted conditions.
UR - http://www.scopus.com/inward/record.url?scp=84872137522&partnerID=8YFLogxK
U2 - 10.1109/ICALIP.2012.6376764
DO - 10.1109/ICALIP.2012.6376764
M3 - 会议稿件
AN - SCOPUS:84872137522
SN - 9781467301718
T3 - ICALIP 2012 - 2012 International Conference on Audio, Language and Image Processing, Proceedings
SP - 1014
EP - 1019
BT - ICALIP 2012 - 2012 International Conference on Audio, Language and Image Processing, Proceedings
T2 - 2012 3rd IEEE/IET International Conference on Audio, Language and Image Processing, ICALIP 2012
Y2 - 16 July 2012 through 18 July 2012
ER -