TY - JOUR
T1 - The NPU-ASLP System for Deepfake Algorithm Recognition in ADD 2023 Challenge
AU - Wang, Ziqian
AU - Wang, Qing
AU - Yao, Jixun
AU - Xie, Lei
N1 - Publisher Copyright:
© 2023 CEUR-WS. All rights reserved.
PY - 2023
Y1 - 2023
N2 - This paper describes our NPU-ASLP system for the Deepfake Algorithm Recognition (AR) task in the Audio Deepfake Detection 2023 Challenge. This task is an open-set classification problem focusing on identifying the specific algorithms used to create the deepfake speech utterances. In this task, we introduce a deepfake AR system with contributions in data augmentation, model architecture, fine-tuning strategy, and model ensemble. We first generate training data by applying various data augmentation techniques to the deepfake speech. We then utilize ResNet101 and a long-term temporal-frequency transformer module to better capture audio context dependencies. Moreover, we employ pre-trained WavLM for better feature extraction. Additionally, our content-invariant fine-tuning strategy improves performance. Finally, model ensemble with different representation combinations further enhances performance. Experiments show that our system achieves an F1-score of 0.7355 on the evaluation set, and ranks fourth in the challenge.
AB - This paper describes our NPU-ASLP system for the Deepfake Algorithm Recognition (AR) task in the Audio Deepfake Detection 2023 Challenge. This task is an open-set classification problem focusing on identifying the specific algorithms used to create the deepfake speech utterances. In this task, we introduce a deepfake AR system with contributions in data augmentation, model architecture, fine-tuning strategy, and model ensemble. We first generate training data by applying various data augmentation techniques to the deepfake speech. We then utilize ResNet101 and a long-term temporal-frequency transformer module to better capture audio context dependencies. Moreover, we employ pre-trained WavLM for better feature extraction. Additionally, our content-invariant fine-tuning strategy improves performance. Finally, model ensemble with different representation combinations further enhances performance. Experiments show that our system achieves an F1-score of 0.7355 on the evaluation set, and ranks fourth in the challenge.
KW - Deepfake algorithm recognition
KW - data augmentation
KW - model ensemble
KW - transformer
UR - http://www.scopus.com/inward/record.url?scp=85181148066&partnerID=8YFLogxK
M3 - 会议文章
AN - SCOPUS:85181148066
SN - 1613-0073
VL - 3597
SP - 64
EP - 69
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
T2 - 2023 Workshop on Deepfake Audio Detection and Analysis, DADA 2023
Y2 - 19 August 2023
ER -