Abstract
This paper describes our NPU-ASLP system for the Deepfake Algorithm Recognition (AR) task in the Audio Deepfake Detection 2023 Challenge. This task is an open-set classification problem focusing on identifying the specific algorithms used to create the deepfake speech utterances. In this task, we introduce a deepfake AR system with contributions in data augmentation, model architecture, fine-tuning strategy, and model ensemble. We first generate training data by applying various data augmentation techniques to the deepfake speech. We then utilize ResNet101 and a long-term temporal-frequency transformer module to better capture audio context dependencies. Moreover, we employ pre-trained WavLM for better feature extraction. Additionally, our content-invariant fine-tuning strategy improves performance. Finally, model ensemble with different representation combinations further enhances performance. Experiments show that our system achieves an F1-score of 0.7355 on the evaluation set, and ranks fourth in the challenge.
Original language | English |
---|---|
Pages (from-to) | 64-69 |
Number of pages | 6 |
Journal | CEUR Workshop Proceedings |
Volume | 3597 |
State | Published - 2023 |
Event | 2023 Workshop on Deepfake Audio Detection and Analysis, DADA 2023 - Macao, China Duration: 19 Aug 2023 → … |
Keywords
- data augmentation
- Deepfake algorithm recognition
- model ensemble
- transformer