Distance-Based Weight Transfer for Fine-Tuning From Near-Field to Far-Field Speaker Verification

Li Zhang; Qing Wang; Hongji Wang; Yue Li; Wei Rao; Yannan Wang; Lei Xie

doi:10.1109/ICASSP49357.2023.10096790

Distance-Based Weight Transfer for Fine-Tuning From Near-Field to Far-Field Speaker Verification

Li Zhang, Qing Wang, Hongji Wang, Yue Li, Wei Rao, Yannan Wang, Lei Xie

计算机学院

科研成果: 期刊稿件 › 会议文章 › 同行评审

6 引用（Scopus）

摘要

The scarcity of labeled far-field speech is a constraint for training superior far-field speaker verification systems. In general, fine-tuning the model pre-trained on large-scale near- field speech through a small amount of far-field speech substantially outperforms training from scratch. However, the vanilla fine-tuning suffers from two limitations - catastrophic forgetting and overfitting. In this paper, we propose a weight transfer regularization (WTR) loss to constrain the distance of the weights between the pre-trained model and the fine-tuned model. With the WTR loss, the fine-tuning process takes advantage of the previously acquired discriminative ability from the large-scale near-field speech and avoids catastrophic for- getting. Meanwhile, the analysis based on the PAC-Bayes generalization theory indicates that the WTR loss makes the fine-tuned model have a tighter generalization bound, thus mitigating the overfitting problem. Moreover, three different norm distances for weight transfer are explored, which are L1-norm distance, L2-norm distance, and Max-norm distance. We evaluate the effectiveness of the WTR loss on VoxCeleb (pre-trained) and FFSVC (fine-tuned) datasets. Experimental results show that the distance-based weight transfer fine-tuning strategy significantly outperforms vanilla fine- tuning and other competitive domain adaptation methods.

源语言	英语
期刊	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
DOI	https://doi.org/10.1109/ICASSP49357.2023.10096790
出版状态	已出版 - 2023
活动	48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 - Rhodes Island, 希腊期限: 4 6月 2023 → 10 6月 2023

访问文件

10.1109/ICASSP49357.2023.10096790

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{f2c80badb55a4c7d8ba8705a30465609,

title = "Distance-Based Weight Transfer for Fine-Tuning From Near-Field to Far-Field Speaker Verification",

abstract = "The scarcity of labeled far-field speech is a constraint for training superior far-field speaker verification systems. In general, fine-tuning the model pre-trained on large-scale near- field speech through a small amount of far-field speech substantially outperforms training from scratch. However, the vanilla fine-tuning suffers from two limitations - catastrophic forgetting and overfitting. In this paper, we propose a weight transfer regularization (WTR) loss to constrain the distance of the weights between the pre-trained model and the fine-tuned model. With the WTR loss, the fine-tuning process takes advantage of the previously acquired discriminative ability from the large-scale near-field speech and avoids catastrophic for- getting. Meanwhile, the analysis based on the PAC-Bayes generalization theory indicates that the WTR loss makes the fine-tuned model have a tighter generalization bound, thus mitigating the overfitting problem. Moreover, three different norm distances for weight transfer are explored, which are L1-norm distance, L2-norm distance, and Max-norm distance. We evaluate the effectiveness of the WTR loss on VoxCeleb (pre-trained) and FFSVC (fine-tuned) datasets. Experimental results show that the distance-based weight transfer fine-tuning strategy significantly outperforms vanilla fine- tuning and other competitive domain adaptation methods.",

keywords = "far-field speaker verification, fine-tuning, weight transfer",

author = "Li Zhang and Qing Wang and Hongji Wang and Yue Li and Wei Rao and Yannan Wang and Lei Xie",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 ; Conference date: 04-06-2023 Through 10-06-2023",

year = "2023",

doi = "10.1109/ICASSP49357.2023.10096790",

language = "英语",

journal = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

issn = "1520-6149",

}

TY - JOUR

T1 - Distance-Based Weight Transfer for Fine-Tuning From Near-Field to Far-Field Speaker Verification

AU - Zhang, Li

AU - Wang, Qing

AU - Wang, Hongji

AU - Li, Yue

AU - Rao, Wei

AU - Wang, Yannan

AU - Xie, Lei

PY - 2023

Y1 - 2023

N2 - The scarcity of labeled far-field speech is a constraint for training superior far-field speaker verification systems. In general, fine-tuning the model pre-trained on large-scale near- field speech through a small amount of far-field speech substantially outperforms training from scratch. However, the vanilla fine-tuning suffers from two limitations - catastrophic forgetting and overfitting. In this paper, we propose a weight transfer regularization (WTR) loss to constrain the distance of the weights between the pre-trained model and the fine-tuned model. With the WTR loss, the fine-tuning process takes advantage of the previously acquired discriminative ability from the large-scale near-field speech and avoids catastrophic for- getting. Meanwhile, the analysis based on the PAC-Bayes generalization theory indicates that the WTR loss makes the fine-tuned model have a tighter generalization bound, thus mitigating the overfitting problem. Moreover, three different norm distances for weight transfer are explored, which are L1-norm distance, L2-norm distance, and Max-norm distance. We evaluate the effectiveness of the WTR loss on VoxCeleb (pre-trained) and FFSVC (fine-tuned) datasets. Experimental results show that the distance-based weight transfer fine-tuning strategy significantly outperforms vanilla fine- tuning and other competitive domain adaptation methods.

AB - The scarcity of labeled far-field speech is a constraint for training superior far-field speaker verification systems. In general, fine-tuning the model pre-trained on large-scale near- field speech through a small amount of far-field speech substantially outperforms training from scratch. However, the vanilla fine-tuning suffers from two limitations - catastrophic forgetting and overfitting. In this paper, we propose a weight transfer regularization (WTR) loss to constrain the distance of the weights between the pre-trained model and the fine-tuned model. With the WTR loss, the fine-tuning process takes advantage of the previously acquired discriminative ability from the large-scale near-field speech and avoids catastrophic for- getting. Meanwhile, the analysis based on the PAC-Bayes generalization theory indicates that the WTR loss makes the fine-tuned model have a tighter generalization bound, thus mitigating the overfitting problem. Moreover, three different norm distances for weight transfer are explored, which are L1-norm distance, L2-norm distance, and Max-norm distance. We evaluate the effectiveness of the WTR loss on VoxCeleb (pre-trained) and FFSVC (fine-tuned) datasets. Experimental results show that the distance-based weight transfer fine-tuning strategy significantly outperforms vanilla fine- tuning and other competitive domain adaptation methods.

KW - far-field speaker verification

KW - fine-tuning

KW - weight transfer

UR - http://www.scopus.com/inward/record.url?scp=85144283341&partnerID=8YFLogxK

U2 - 10.1109/ICASSP49357.2023.10096790

DO - 10.1109/ICASSP49357.2023.10096790

M3 - 会议文章

AN - SCOPUS:85144283341

SN - 1520-6149

JO - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

JF - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

T2 - 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023

Y2 - 4 June 2023 through 10 June 2023

ER -

Distance-Based Weight Transfer for Fine-Tuning From Near-Field to Far-Field Speaker Verification

摘要

访问文件

其它文件与链接

指纹

引用此