A two stage mask estimation approach to robust speaker verification

Yali Zhao; Lei Xie; Zhonghua Fu

A two stage mask estimation approach to robust speaker verification

Yali Zhao, Lei Xie, Zhonghua Fu

School of Computer Science

Northwestern Polytechnical University Xian

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

We propose a two-stage mask estimation approach to robust speaker verification (SV) in noise environments. We consider a practical semi-blind SV scenario: the location of the target speaker is fixed while the locations of all interferers are unknown. In the first stage, we use a dual-microphone and a semi-blind degenerate unmixing estimation technique (DUET) to estimate an initial binary mask. In the second stage, we refine the mask based on the time and frequency histograms of the initial mask. As a result, only highly reliable time-frequency components in the spectro-temporal features are kept for downstream verification. Experiments show that the proposed approach is superior to a baseline MFCC approach and a recent local SNR based mask estimation approach.

Original language	English
Title of host publication	13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
Pages	2653-2656
Number of pages	4
State	Published - 2012
Event	13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012 - Portland, OR, United States Duration: 9 Sep 2012 → 13 Sep 2012

Publication series

Name	13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
Volume	3

Conference

Conference	13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
Country/Territory	United States
City	Portland, OR
Period	9/09/12 → 13/09/12

Keywords

Binary mask estimation
Dual-microphone
Missing feature theory
Speaker verification

Cite this

@inproceedings{a42945a89eb143cca7ff4f9d20a1c40c,

title = "A two stage mask estimation approach to robust speaker verification",

abstract = "We propose a two-stage mask estimation approach to robust speaker verification (SV) in noise environments. We consider a practical semi-blind SV scenario: the location of the target speaker is fixed while the locations of all interferers are unknown. In the first stage, we use a dual-microphone and a semi-blind degenerate unmixing estimation technique (DUET) to estimate an initial binary mask. In the second stage, we refine the mask based on the time and frequency histograms of the initial mask. As a result, only highly reliable time-frequency components in the spectro-temporal features are kept for downstream verification. Experiments show that the proposed approach is superior to a baseline MFCC approach and a recent local SNR based mask estimation approach.",

keywords = "Binary mask estimation, Dual-microphone, Missing feature theory, Speaker verification",

author = "Yali Zhao and Lei Xie and Zhonghua Fu",

year = "2012",

language = "英语",

isbn = "9781622767595",

series = "13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012",

pages = "2653--2656",

booktitle = "13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012",

note = "13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012 ; Conference date: 09-09-2012 Through 13-09-2012",

}

Zhao, Y, Xie, L & Fu, Z 2012, A two stage mask estimation approach to robust speaker verification. in 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012. 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012, vol. 3, pp. 2653-2656, 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012, Portland, OR, United States, 9/09/12.

A two stage mask estimation approach to robust speaker verification. / Zhao, Yali; Xie, Lei; Fu, Zhonghua.
13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012. 2012. p. 2653-2656 (13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012; Vol. 3).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - A two stage mask estimation approach to robust speaker verification

AU - Zhao, Yali

AU - Xie, Lei

AU - Fu, Zhonghua

PY - 2012

Y1 - 2012

N2 - We propose a two-stage mask estimation approach to robust speaker verification (SV) in noise environments. We consider a practical semi-blind SV scenario: the location of the target speaker is fixed while the locations of all interferers are unknown. In the first stage, we use a dual-microphone and a semi-blind degenerate unmixing estimation technique (DUET) to estimate an initial binary mask. In the second stage, we refine the mask based on the time and frequency histograms of the initial mask. As a result, only highly reliable time-frequency components in the spectro-temporal features are kept for downstream verification. Experiments show that the proposed approach is superior to a baseline MFCC approach and a recent local SNR based mask estimation approach.

AB - We propose a two-stage mask estimation approach to robust speaker verification (SV) in noise environments. We consider a practical semi-blind SV scenario: the location of the target speaker is fixed while the locations of all interferers are unknown. In the first stage, we use a dual-microphone and a semi-blind degenerate unmixing estimation technique (DUET) to estimate an initial binary mask. In the second stage, we refine the mask based on the time and frequency histograms of the initial mask. As a result, only highly reliable time-frequency components in the spectro-temporal features are kept for downstream verification. Experiments show that the proposed approach is superior to a baseline MFCC approach and a recent local SNR based mask estimation approach.

KW - Binary mask estimation

KW - Dual-microphone

KW - Missing feature theory

KW - Speaker verification

UR - http://www.scopus.com/inward/record.url?scp=84878523326&partnerID=8YFLogxK

M3 - 会议稿件

AN - SCOPUS:84878523326

SN - 9781622767595

T3 - 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012

SP - 2653

EP - 2656

BT - 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012

T2 - 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012

Y2 - 9 September 2012 through 13 September 2012

ER -

A two stage mask estimation approach to robust speaker verification

Abstract

Publication series

Conference

Keywords

Other files and links

Fingerprint

Cite this