NPU speaker verification system for interspeech 2020 far-field speaker verification challenge

Li Zhang; Jian Wu; Lei Xie

doi:10.21437/Interspeech.2020-2688

NPU speaker verification system for interspeech 2020 far-field speaker verification challenge

Li Zhang, Jian Wu, Lei Xie

计算机学院

Northwestern Polytechnical University Xian

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

8 引用（Scopus）

摘要

This paper describes the NPU system submitted to Interspeech 2020 Far-Field Speaker Verification Challenge (FFSVC). We particularly focus on far-field text-dependent SV from single (task1) and multiple microphone arrays (task3). The major challenges in such scenarios are short utterance and cross-channel and distance mismatch for enrollment and test. With the belief that better speaker embedding can alleviate the effects from short utterance, we introduce a new speaker embedding architecture - ResNet-BAM, which integrates a bottleneck attention module with ResNet as a simple and efficient way to further improve representation power of ResNet. This contribution brings up to 1% EER reduction. We further address the mismatch problem in three directions. First, domain adversarial training, which aims to learn domain-invariant features, can yield to 0.8% EER reduction. Second, front-end signal processing, including WPE and beamforming, has no obvious contribution, but together with data selection and domain adversarial training, can further contribute to 0.5% EER reduction. Finally, data augmentation, which works with a specifically-designed data selection strategy, can lead to 2% EER reduction. Together with the above contributions, in the middle challenge results, our single submission system (without multi-system fusion) achieves the first and second place on task 1 and task 3, respectively.

源语言	英语
主期刊名	Interspeech 2020
出版商	International Speech Communication Association
页	3471-3475
页数	5
ISBN（印刷版）	9781713820697
DOI	https://doi.org/10.21437/Interspeech.2020-2688
出版状态	已出版 - 2020
活动	21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020 - Shanghai, 中国期限: 25 10月 2020 → 29 10月 2020

出版系列

姓名	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
卷	2020-October
ISSN（印刷版）	2308-457X
ISSN（电子版）	1990-9772

会议

会议	21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020
国家/地区	中国
市	Shanghai
时期	25/10/20 → 29/10/20

访问文件

10.21437/Interspeech.2020-2688

其它文件与链接

链接到 Scopus 的出版物

引用此

Zhang, L., Wu, J., & Xie, L. (2020). NPU speaker verification system for interspeech 2020 far-field speaker verification challenge. 在 Interspeech 2020 (页码 3471-3475). (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH; 卷 2020-October). International Speech Communication Association. https://doi.org/10.21437/Interspeech.2020-2688

@inproceedings{93eff081b224440cad7c5c912a3dbed2,

title = "NPU speaker verification system for interspeech 2020 far-field speaker verification challenge",

abstract = "This paper describes the NPU system submitted to Interspeech 2020 Far-Field Speaker Verification Challenge (FFSVC). We particularly focus on far-field text-dependent SV from single (task1) and multiple microphone arrays (task3). The major challenges in such scenarios are short utterance and cross-channel and distance mismatch for enrollment and test. With the belief that better speaker embedding can alleviate the effects from short utterance, we introduce a new speaker embedding architecture - ResNet-BAM, which integrates a bottleneck attention module with ResNet as a simple and efficient way to further improve representation power of ResNet. This contribution brings up to 1% EER reduction. We further address the mismatch problem in three directions. First, domain adversarial training, which aims to learn domain-invariant features, can yield to 0.8% EER reduction. Second, front-end signal processing, including WPE and beamforming, has no obvious contribution, but together with data selection and domain adversarial training, can further contribute to 0.5% EER reduction. Finally, data augmentation, which works with a specifically-designed data selection strategy, can lead to 2% EER reduction. Together with the above contributions, in the middle challenge results, our single submission system (without multi-system fusion) achieves the first and second place on task 1 and task 3, respectively.",

keywords = "Data augmentation, Domain adversarial training, Far-field, Speaker verification",

author = "Li Zhang and Jian Wu and Lei Xie",

note = "Publisher Copyright: {\textcopyright} 2020 ISCA; 21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020 ; Conference date: 25-10-2020 Through 29-10-2020",

year = "2020",

doi = "10.21437/Interspeech.2020-2688",

language = "英语",

isbn = "9781713820697",

series = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

publisher = "International Speech Communication Association",

pages = "3471--3475",

booktitle = "Interspeech 2020",

}

Zhang, L, Wu, J & Xie, L 2020, NPU speaker verification system for interspeech 2020 far-field speaker verification challenge. 在 Interspeech 2020. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 卷 2020-October, International Speech Communication Association, 页码 3471-3475, 21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020, Shanghai, 中国, 25/10/20. https://doi.org/10.21437/Interspeech.2020-2688

NPU speaker verification system for interspeech 2020 far-field speaker verification challenge. / Zhang, Li; Wu, Jian; Xie, Lei.
Interspeech 2020. International Speech Communication Association, 2020. 页码 3471-3475 (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH; 卷 2020-October).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - NPU speaker verification system for interspeech 2020 far-field speaker verification challenge

AU - Zhang, Li

AU - Wu, Jian

AU - Xie, Lei

PY - 2020

Y1 - 2020

N2 - This paper describes the NPU system submitted to Interspeech 2020 Far-Field Speaker Verification Challenge (FFSVC). We particularly focus on far-field text-dependent SV from single (task1) and multiple microphone arrays (task3). The major challenges in such scenarios are short utterance and cross-channel and distance mismatch for enrollment and test. With the belief that better speaker embedding can alleviate the effects from short utterance, we introduce a new speaker embedding architecture - ResNet-BAM, which integrates a bottleneck attention module with ResNet as a simple and efficient way to further improve representation power of ResNet. This contribution brings up to 1% EER reduction. We further address the mismatch problem in three directions. First, domain adversarial training, which aims to learn domain-invariant features, can yield to 0.8% EER reduction. Second, front-end signal processing, including WPE and beamforming, has no obvious contribution, but together with data selection and domain adversarial training, can further contribute to 0.5% EER reduction. Finally, data augmentation, which works with a specifically-designed data selection strategy, can lead to 2% EER reduction. Together with the above contributions, in the middle challenge results, our single submission system (without multi-system fusion) achieves the first and second place on task 1 and task 3, respectively.

AB - This paper describes the NPU system submitted to Interspeech 2020 Far-Field Speaker Verification Challenge (FFSVC). We particularly focus on far-field text-dependent SV from single (task1) and multiple microphone arrays (task3). The major challenges in such scenarios are short utterance and cross-channel and distance mismatch for enrollment and test. With the belief that better speaker embedding can alleviate the effects from short utterance, we introduce a new speaker embedding architecture - ResNet-BAM, which integrates a bottleneck attention module with ResNet as a simple and efficient way to further improve representation power of ResNet. This contribution brings up to 1% EER reduction. We further address the mismatch problem in three directions. First, domain adversarial training, which aims to learn domain-invariant features, can yield to 0.8% EER reduction. Second, front-end signal processing, including WPE and beamforming, has no obvious contribution, but together with data selection and domain adversarial training, can further contribute to 0.5% EER reduction. Finally, data augmentation, which works with a specifically-designed data selection strategy, can lead to 2% EER reduction. Together with the above contributions, in the middle challenge results, our single submission system (without multi-system fusion) achieves the first and second place on task 1 and task 3, respectively.

KW - Data augmentation

KW - Domain adversarial training

KW - Far-field

KW - Speaker verification

UR - http://www.scopus.com/inward/record.url?scp=85098103062&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2020-2688

DO - 10.21437/Interspeech.2020-2688

M3 - 会议稿件

AN - SCOPUS:85098103062

SN - 9781713820697

T3 - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SP - 3471

EP - 3475

BT - Interspeech 2020

PB - International Speech Communication Association

T2 - 21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020

Y2 - 25 October 2020 through 29 October 2020

ER -

NPU speaker verification system for interspeech 2020 far-field speaker verification challenge

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此