Dual-microphone based binary mask estimation for robust speaker verification

Yali Zhao; Zhong Hua Fu; Lei Xie; Jian Zhang; Yanning Zhang

doi:10.1109/ICALIP.2012.6376764

Dual-microphone based binary mask estimation for robust speaker verification

Yali Zhao, Zhong Hua Fu, Lei Xie, Jian Zhang, Yanning Zhang

School of Computer Science

Northwestern Polytechnical University Xian

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

Missing feature theory (MFT) has shown great potential for robust speaker recognition in noisy environments. Accurate estimation of binary mask is crucial in MFT-based speaker recognition. This paper addresses the speaker verification problem using MFT in a practical scenario: the location of target speaker is fixed while the locations of noise interferences are unknown. Specifically, we propose a dual-microphone semi-blind approach to estimate the binary mask. During system initialization, a spatial location model for the target is trained precisely. Then a spatial model for corrupted speech is obtained on-line by model adaptation. Finally, the binary mask is estimated by likelihood comparison. Moreover, we propose a reliable frame selection method to further focus on the reliable speech frames for missing data speaker recognition. Experimental results demonstrate that our proposed approach achieves substantial improvements in recognition performance in both white noise and speech corrupted conditions.

Original language	English
Title of host publication	ICALIP 2012 - 2012 International Conference on Audio, Language and Image Processing, Proceedings
Pages	1014-1019
Number of pages	6
DOIs	https://doi.org/10.1109/ICALIP.2012.6376764
State	Published - 2012
Event	2012 3rd IEEE/IET International Conference on Audio, Language and Image Processing, ICALIP 2012 - Shanghai, China Duration: 16 Jul 2012 → 18 Jul 2012

Publication series

Name	ICALIP 2012 - 2012 International Conference on Audio, Language and Image Processing, Proceedings

Conference

Conference	2012 3rd IEEE/IET International Conference on Audio, Language and Image Processing, ICALIP 2012
Country/Territory	China
City	Shanghai
Period	16/07/12 → 18/07/12

Access to Document

10.1109/ICALIP.2012.6376764

Cite this

Zhao, Y., Fu, Z. H., Xie, L., Zhang, J., & Zhang, Y. (2012). Dual-microphone based binary mask estimation for robust speaker verification. In ICALIP 2012 - 2012 International Conference on Audio, Language and Image Processing, Proceedings (pp. 1014-1019). Article 6376764 (ICALIP 2012 - 2012 International Conference on Audio, Language and Image Processing, Proceedings). https://doi.org/10.1109/ICALIP.2012.6376764

@inproceedings{9aa23debac014c02bea259d4ec99351b,

title = "Dual-microphone based binary mask estimation for robust speaker verification",

abstract = "Missing feature theory (MFT) has shown great potential for robust speaker recognition in noisy environments. Accurate estimation of binary mask is crucial in MFT-based speaker recognition. This paper addresses the speaker verification problem using MFT in a practical scenario: the location of target speaker is fixed while the locations of noise interferences are unknown. Specifically, we propose a dual-microphone semi-blind approach to estimate the binary mask. During system initialization, a spatial location model for the target is trained precisely. Then a spatial model for corrupted speech is obtained on-line by model adaptation. Finally, the binary mask is estimated by likelihood comparison. Moreover, we propose a reliable frame selection method to further focus on the reliable speech frames for missing data speaker recognition. Experimental results demonstrate that our proposed approach achieves substantial improvements in recognition performance in both white noise and speech corrupted conditions.",

author = "Yali Zhao and Fu, {Zhong Hua} and Lei Xie and Jian Zhang and Yanning Zhang",

year = "2012",

doi = "10.1109/ICALIP.2012.6376764",

language = "英语",

isbn = "9781467301718",

series = "ICALIP 2012 - 2012 International Conference on Audio, Language and Image Processing, Proceedings",

pages = "1014--1019",

booktitle = "ICALIP 2012 - 2012 International Conference on Audio, Language and Image Processing, Proceedings",

note = "2012 3rd IEEE/IET International Conference on Audio, Language and Image Processing, ICALIP 2012 ; Conference date: 16-07-2012 Through 18-07-2012",

}

Zhao, Y, Fu, ZH, Xie, L, Zhang, J & Zhang, Y 2012, Dual-microphone based binary mask estimation for robust speaker verification. in ICALIP 2012 - 2012 International Conference on Audio, Language and Image Processing, Proceedings., 6376764, ICALIP 2012 - 2012 International Conference on Audio, Language and Image Processing, Proceedings, pp. 1014-1019, 2012 3rd IEEE/IET International Conference on Audio, Language and Image Processing, ICALIP 2012, Shanghai, China, 16/07/12. https://doi.org/10.1109/ICALIP.2012.6376764

Dual-microphone based binary mask estimation for robust speaker verification. / Zhao, Yali; Fu, Zhong Hua; Xie, Lei et al.
ICALIP 2012 - 2012 International Conference on Audio, Language and Image Processing, Proceedings. 2012. p. 1014-1019 6376764 (ICALIP 2012 - 2012 International Conference on Audio, Language and Image Processing, Proceedings).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Dual-microphone based binary mask estimation for robust speaker verification

AU - Zhao, Yali

AU - Fu, Zhong Hua

AU - Xie, Lei

AU - Zhang, Jian

AU - Zhang, Yanning

PY - 2012

Y1 - 2012

N2 - Missing feature theory (MFT) has shown great potential for robust speaker recognition in noisy environments. Accurate estimation of binary mask is crucial in MFT-based speaker recognition. This paper addresses the speaker verification problem using MFT in a practical scenario: the location of target speaker is fixed while the locations of noise interferences are unknown. Specifically, we propose a dual-microphone semi-blind approach to estimate the binary mask. During system initialization, a spatial location model for the target is trained precisely. Then a spatial model for corrupted speech is obtained on-line by model adaptation. Finally, the binary mask is estimated by likelihood comparison. Moreover, we propose a reliable frame selection method to further focus on the reliable speech frames for missing data speaker recognition. Experimental results demonstrate that our proposed approach achieves substantial improvements in recognition performance in both white noise and speech corrupted conditions.

AB - Missing feature theory (MFT) has shown great potential for robust speaker recognition in noisy environments. Accurate estimation of binary mask is crucial in MFT-based speaker recognition. This paper addresses the speaker verification problem using MFT in a practical scenario: the location of target speaker is fixed while the locations of noise interferences are unknown. Specifically, we propose a dual-microphone semi-blind approach to estimate the binary mask. During system initialization, a spatial location model for the target is trained precisely. Then a spatial model for corrupted speech is obtained on-line by model adaptation. Finally, the binary mask is estimated by likelihood comparison. Moreover, we propose a reliable frame selection method to further focus on the reliable speech frames for missing data speaker recognition. Experimental results demonstrate that our proposed approach achieves substantial improvements in recognition performance in both white noise and speech corrupted conditions.

UR - http://www.scopus.com/inward/record.url?scp=84872137522&partnerID=8YFLogxK

U2 - 10.1109/ICALIP.2012.6376764

DO - 10.1109/ICALIP.2012.6376764

M3 - 会议稿件

AN - SCOPUS:84872137522

SN - 9781467301718

T3 - ICALIP 2012 - 2012 International Conference on Audio, Language and Image Processing, Proceedings

SP - 1014

EP - 1019

BT - ICALIP 2012 - 2012 International Conference on Audio, Language and Image Processing, Proceedings

T2 - 2012 3rd IEEE/IET International Conference on Audio, Language and Image Processing, ICALIP 2012

Y2 - 16 July 2012 through 18 July 2012

ER -

Zhao Y, Fu ZH, Xie L, Zhang J, Zhang Y. Dual-microphone based binary mask estimation for robust speaker verification. In ICALIP 2012 - 2012 International Conference on Audio, Language and Image Processing, Proceedings. 2012. p. 1014-1019. 6376764. (ICALIP 2012 - 2012 International Conference on Audio, Language and Image Processing, Proceedings). doi: 10.1109/ICALIP.2012.6376764

Dual-microphone based binary mask estimation for robust speaker verification

Abstract

Publication series

Conference

Access to Document

Other files and links

Fingerprint

Cite this