A segmental DNN/i-vector approach for digit-prompted speaker verification

Jie Yan; Xie Lei; Guangsen Wang; Zhong Hua Fu

doi:10.1109/APSIPA.2017.8281992

A segmental DNN/i-vector approach for digit-prompted speaker verification

Jie Yan, Xie Lei, Guangsen Wang, Zhong Hua Fu

School of Astronautics

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

2 Scopus citations

Abstract

DNN/i-vectors have achieved state-of-the-art performance in text-independent speaker verification systems. For such systems, the UBM posteriors are replaced with the DNN posteriors when training the i-vector extractor to better model the phonetic space. However, the DNN/i-vector systems have limited success on text-dependent speaker verification systems as the lexical variabilities, which are important for such applications, are suppressed in the utterance-level i-vectors. In this paper, we propose a segmental DNN/i-vector approach for the digit-prompted speaker verification task. Specifically, we segment the utterance into digits and model each digit using an individual DNN/i-vector system. By modeling the variability for each digit independently, we can focus more on the speaker characteristics for each digit. To take into consideration the uncertainties in the DNN posteriors, we propose a confidence measure based weighting method. On the RSR2015 dataset, the proposed approach yields an equal error rate of 3.44%, compared to 5.76% of the baseline utterance-level DNN/i-vector system and 4.54% of the joint factor analysis (JFA) system.

Original language	English
Title of host publication	Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	1-5
Number of pages	5
ISBN (Electronic)	9781538615423
DOIs	https://doi.org/10.1109/APSIPA.2017.8281992
State	Published - 2 Jul 2017
Event	9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017 - Kuala Lumpur, Malaysia Duration: 12 Dec 2017 → 15 Dec 2017

Publication series

Name	Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017
Volume	2018-February

Conference

Conference	9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017
Country/Territory	Malaysia
City	Kuala Lumpur
Period	12/12/17 → 15/12/17

Access to Document

10.1109/APSIPA.2017.8281992

Cite this

Yan, J., Lei, X., Wang, G., & Fu, Z. H. (2017). A segmental DNN/i-vector approach for digit-prompted speaker verification. In Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017 (pp. 1-5). (Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017; Vol. 2018-February). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/APSIPA.2017.8281992

Yan, Jie ; Lei, Xie ; Wang, Guangsen et al. / A segmental DNN/i-vector approach for digit-prompted speaker verification. Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017. Institute of Electrical and Electronics Engineers Inc., 2017. pp. 1-5 (Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017).

@inproceedings{dd849b0d60824dbdb2c916884b4cb0ee,

title = "A segmental DNN/i-vector approach for digit-prompted speaker verification",

abstract = "DNN/i-vectors have achieved state-of-the-art performance in text-independent speaker verification systems. For such systems, the UBM posteriors are replaced with the DNN posteriors when training the i-vector extractor to better model the phonetic space. However, the DNN/i-vector systems have limited success on text-dependent speaker verification systems as the lexical variabilities, which are important for such applications, are suppressed in the utterance-level i-vectors. In this paper, we propose a segmental DNN/i-vector approach for the digit-prompted speaker verification task. Specifically, we segment the utterance into digits and model each digit using an individual DNN/i-vector system. By modeling the variability for each digit independently, we can focus more on the speaker characteristics for each digit. To take into consideration the uncertainties in the DNN posteriors, we propose a confidence measure based weighting method. On the RSR2015 dataset, the proposed approach yields an equal error rate of 3.44%, compared to 5.76% of the baseline utterance-level DNN/i-vector system and 4.54% of the joint factor analysis (JFA) system.",

author = "Jie Yan and Xie Lei and Guangsen Wang and Fu, {Zhong Hua}",

note = "Publisher Copyright: {\textcopyright} 2017 IEEE.; 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017 ; Conference date: 12-12-2017 Through 15-12-2017",

year = "2017",

month = jul,

day = "2",

doi = "10.1109/APSIPA.2017.8281992",

language = "英语",

series = "Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "1--5",

booktitle = "Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017",

}

Yan, J, Lei, X, Wang, G & Fu, ZH 2017, A segmental DNN/i-vector approach for digit-prompted speaker verification. in Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017. Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017, vol. 2018-February, Institute of Electrical and Electronics Engineers Inc., pp. 1-5, 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017, Kuala Lumpur, Malaysia, 12/12/17. https://doi.org/10.1109/APSIPA.2017.8281992

A segmental DNN/i-vector approach for digit-prompted speaker verification. / Yan, Jie; Lei, Xie; Wang, Guangsen et al.
Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017. Institute of Electrical and Electronics Engineers Inc., 2017. p. 1-5 (Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017; Vol. 2018-February).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - A segmental DNN/i-vector approach for digit-prompted speaker verification

AU - Yan, Jie

AU - Lei, Xie

AU - Wang, Guangsen

AU - Fu, Zhong Hua

PY - 2017/7/2

Y1 - 2017/7/2

N2 - DNN/i-vectors have achieved state-of-the-art performance in text-independent speaker verification systems. For such systems, the UBM posteriors are replaced with the DNN posteriors when training the i-vector extractor to better model the phonetic space. However, the DNN/i-vector systems have limited success on text-dependent speaker verification systems as the lexical variabilities, which are important for such applications, are suppressed in the utterance-level i-vectors. In this paper, we propose a segmental DNN/i-vector approach for the digit-prompted speaker verification task. Specifically, we segment the utterance into digits and model each digit using an individual DNN/i-vector system. By modeling the variability for each digit independently, we can focus more on the speaker characteristics for each digit. To take into consideration the uncertainties in the DNN posteriors, we propose a confidence measure based weighting method. On the RSR2015 dataset, the proposed approach yields an equal error rate of 3.44%, compared to 5.76% of the baseline utterance-level DNN/i-vector system and 4.54% of the joint factor analysis (JFA) system.

AB - DNN/i-vectors have achieved state-of-the-art performance in text-independent speaker verification systems. For such systems, the UBM posteriors are replaced with the DNN posteriors when training the i-vector extractor to better model the phonetic space. However, the DNN/i-vector systems have limited success on text-dependent speaker verification systems as the lexical variabilities, which are important for such applications, are suppressed in the utterance-level i-vectors. In this paper, we propose a segmental DNN/i-vector approach for the digit-prompted speaker verification task. Specifically, we segment the utterance into digits and model each digit using an individual DNN/i-vector system. By modeling the variability for each digit independently, we can focus more on the speaker characteristics for each digit. To take into consideration the uncertainties in the DNN posteriors, we propose a confidence measure based weighting method. On the RSR2015 dataset, the proposed approach yields an equal error rate of 3.44%, compared to 5.76% of the baseline utterance-level DNN/i-vector system and 4.54% of the joint factor analysis (JFA) system.

UR - http://www.scopus.com/inward/record.url?scp=85050815908&partnerID=8YFLogxK

U2 - 10.1109/APSIPA.2017.8281992

DO - 10.1109/APSIPA.2017.8281992

M3 - 会议稿件

AN - SCOPUS:85050815908

T3 - Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017

SP - 1

EP - 5

BT - Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017

Y2 - 12 December 2017 through 15 December 2017

ER -

Yan J, Lei X, Wang G, Fu ZH. A segmental DNN/i-vector approach for digit-prompted speaker verification. In Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017. Institute of Electrical and Electronics Engineers Inc. 2017. p. 1-5. (Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017). doi: 10.1109/APSIPA.2017.8281992

A segmental DNN/i-vector approach for digit-prompted speaker verification

Abstract

Publication series

Conference

Access to Document

Other files and links

Fingerprint

Cite this