A two stage mask estimation approach to robust speaker verification

Yali Zhao; Lei Xie; Zhonghua Fu

A two stage mask estimation approach to robust speaker verification

Yali Zhao, Lei Xie, Zhonghua Fu

计算机学院

Northwestern Polytechnical University Xian

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

摘要

We propose a two-stage mask estimation approach to robust speaker verification (SV) in noise environments. We consider a practical semi-blind SV scenario: the location of the target speaker is fixed while the locations of all interferers are unknown. In the first stage, we use a dual-microphone and a semi-blind degenerate unmixing estimation technique (DUET) to estimate an initial binary mask. In the second stage, we refine the mask based on the time and frequency histograms of the initial mask. As a result, only highly reliable time-frequency components in the spectro-temporal features are kept for downstream verification. Experiments show that the proposed approach is superior to a baseline MFCC approach and a recent local SNR based mask estimation approach.

源语言	英语
主期刊名	13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
页	2653-2656
页数	4
出版状态	已出版 - 2012
活动	13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012 - Portland, OR, 美国期限: 9 9月 2012 → 13 9月 2012

出版系列

姓名	13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
卷	3

会议

会议	13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
国家/地区	美国
市	Portland, OR
时期	9/09/12 → 13/09/12

其它文件与链接

链接到 Scopus 的出版物

引用此

@inproceedings{a42945a89eb143cca7ff4f9d20a1c40c,

title = "A two stage mask estimation approach to robust speaker verification",

abstract = "We propose a two-stage mask estimation approach to robust speaker verification (SV) in noise environments. We consider a practical semi-blind SV scenario: the location of the target speaker is fixed while the locations of all interferers are unknown. In the first stage, we use a dual-microphone and a semi-blind degenerate unmixing estimation technique (DUET) to estimate an initial binary mask. In the second stage, we refine the mask based on the time and frequency histograms of the initial mask. As a result, only highly reliable time-frequency components in the spectro-temporal features are kept for downstream verification. Experiments show that the proposed approach is superior to a baseline MFCC approach and a recent local SNR based mask estimation approach.",

keywords = "Binary mask estimation, Dual-microphone, Missing feature theory, Speaker verification",

author = "Yali Zhao and Lei Xie and Zhonghua Fu",

year = "2012",

language = "英语",

isbn = "9781622767595",

series = "13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012",

pages = "2653--2656",

booktitle = "13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012",

note = "13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012 ; Conference date: 09-09-2012 Through 13-09-2012",

}

Zhao, Y, Xie, L & Fu, Z 2012, A two stage mask estimation approach to robust speaker verification. 在 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012. 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012, 卷 3, 页码 2653-2656, 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012, Portland, OR, 美国, 9/09/12.

A two stage mask estimation approach to robust speaker verification. / Zhao, Yali; Xie, Lei; Fu, Zhonghua.
13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012. 2012. 页码 2653-2656 (13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012; 卷 3).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - A two stage mask estimation approach to robust speaker verification

AU - Zhao, Yali

AU - Xie, Lei

AU - Fu, Zhonghua

PY - 2012

Y1 - 2012

N2 - We propose a two-stage mask estimation approach to robust speaker verification (SV) in noise environments. We consider a practical semi-blind SV scenario: the location of the target speaker is fixed while the locations of all interferers are unknown. In the first stage, we use a dual-microphone and a semi-blind degenerate unmixing estimation technique (DUET) to estimate an initial binary mask. In the second stage, we refine the mask based on the time and frequency histograms of the initial mask. As a result, only highly reliable time-frequency components in the spectro-temporal features are kept for downstream verification. Experiments show that the proposed approach is superior to a baseline MFCC approach and a recent local SNR based mask estimation approach.

AB - We propose a two-stage mask estimation approach to robust speaker verification (SV) in noise environments. We consider a practical semi-blind SV scenario: the location of the target speaker is fixed while the locations of all interferers are unknown. In the first stage, we use a dual-microphone and a semi-blind degenerate unmixing estimation technique (DUET) to estimate an initial binary mask. In the second stage, we refine the mask based on the time and frequency histograms of the initial mask. As a result, only highly reliable time-frequency components in the spectro-temporal features are kept for downstream verification. Experiments show that the proposed approach is superior to a baseline MFCC approach and a recent local SNR based mask estimation approach.

KW - Binary mask estimation

KW - Dual-microphone

KW - Missing feature theory

KW - Speaker verification

UR - http://www.scopus.com/inward/record.url?scp=84878523326&partnerID=8YFLogxK

M3 - 会议稿件

AN - SCOPUS:84878523326

SN - 9781622767595

T3 - 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012

SP - 2653

EP - 2656

BT - 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012

T2 - 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012

Y2 - 9 September 2012 through 13 September 2012

ER -

A two stage mask estimation approach to robust speaker verification

摘要

出版系列

会议

其它文件与链接

指纹

引用此