TY - JOUR
T1 - Backend Ensemble for Speaker Verification and Spoofing Countermeasure
AU - Zhang, Li
AU - Li, Yue
AU - Zhao, Huan
AU - Wang, Qing
AU - Xie, Lei
N1 - Publisher Copyright:
Copyright © 2022 ISCA.
PY - 2022
Y1 - 2022
N2 - This paper describes the NPU system submitted to Spoofing Aware Speaker Verification Challenge 2022. We particularly focus on the backend ensemble for speaker verification and spoofing countermeasure from three aspects. Firstly, besides simple concatenation, we propose circulant matrix transformation and stacking for speaker embeddings and countermeasure embeddings. With the stacking operation of newly-defined circulant embeddings, we almost explore all the possible interactions between speaker embeddings and countermeasure embeddings. Secondly, we attempt different convolution neural networks to selectively fuse the embeddings' salient regions into channels with convolution kernels. Finally, we design parallel attention in 1D convolution neural networks to learn the global correlation in channel dimensions as well as to learn the important parts in feature dimensions. Meanwhile, we embed squeeze-and-excitation attention in 2D convolutional neural networks to learn the global dependence among speaker embeddings and countermeasure embeddings. Experimental results demonstrate that all the above methods are effective. After fusion of four well-trained models enhanced by the mentioned methods, the best SASV-EER, SPF-EER and SV-EER we achieve are 0.559%, 0.354% and 0.857% on the evaluation set respectively. Together with the above contributions, our submission system achieves the fifth place in this challenge.
AB - This paper describes the NPU system submitted to Spoofing Aware Speaker Verification Challenge 2022. We particularly focus on the backend ensemble for speaker verification and spoofing countermeasure from three aspects. Firstly, besides simple concatenation, we propose circulant matrix transformation and stacking for speaker embeddings and countermeasure embeddings. With the stacking operation of newly-defined circulant embeddings, we almost explore all the possible interactions between speaker embeddings and countermeasure embeddings. Secondly, we attempt different convolution neural networks to selectively fuse the embeddings' salient regions into channels with convolution kernels. Finally, we design parallel attention in 1D convolution neural networks to learn the global correlation in channel dimensions as well as to learn the important parts in feature dimensions. Meanwhile, we embed squeeze-and-excitation attention in 2D convolutional neural networks to learn the global dependence among speaker embeddings and countermeasure embeddings. Experimental results demonstrate that all the above methods are effective. After fusion of four well-trained models enhanced by the mentioned methods, the best SASV-EER, SPF-EER and SV-EER we achieve are 0.559%, 0.354% and 0.857% on the evaluation set respectively. Together with the above contributions, our submission system achieves the fifth place in this challenge.
KW - backend ensemble
KW - speaker verification
KW - spoofing countermeasure
UR - http://www.scopus.com/inward/record.url?scp=85128605176&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2022-10259
DO - 10.21437/Interspeech.2022-10259
M3 - 会议文章
AN - SCOPUS:85128605176
SN - 2308-457X
VL - 2022-September
SP - 4381
EP - 4385
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
T2 - 23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022
Y2 - 18 September 2022 through 22 September 2022
ER -