The NPU-ASLP System for Deepfake Algorithm Recognition in ADD 2023 Challenge

Ziqian Wang; Qing Wang; Jixun Yao; Lei Xie

The NPU-ASLP System for Deepfake Algorithm Recognition in ADD 2023 Challenge

Ziqian Wang, Qing Wang, Jixun Yao, Lei Xie

School of Computer Science

Northwestern Polytechnical University Xian

Research output: Contribution to journal › Conference article › peer-review

2 Scopus citations

Abstract

This paper describes our NPU-ASLP system for the Deepfake Algorithm Recognition (AR) task in the Audio Deepfake Detection 2023 Challenge. This task is an open-set classification problem focusing on identifying the specific algorithms used to create the deepfake speech utterances. In this task, we introduce a deepfake AR system with contributions in data augmentation, model architecture, fine-tuning strategy, and model ensemble. We first generate training data by applying various data augmentation techniques to the deepfake speech. We then utilize ResNet101 and a long-term temporal-frequency transformer module to better capture audio context dependencies. Moreover, we employ pre-trained WavLM for better feature extraction. Additionally, our content-invariant fine-tuning strategy improves performance. Finally, model ensemble with different representation combinations further enhances performance. Experiments show that our system achieves an F1-score of 0.7355 on the evaluation set, and ranks fourth in the challenge.

Original language	English
Pages (from-to)	64-69
Number of pages	6
Journal	CEUR Workshop Proceedings
Volume	3597
State	Published - 2023
Event	2023 Workshop on Deepfake Audio Detection and Analysis, DADA 2023 - Macao, China Duration: 19 Aug 2023 → …

Keywords

Deepfake algorithm recognition
data augmentation
model ensemble
transformer

Cite this

@article{5dd963c28f0443f693b4c12bfd8aae6d,

title = "The NPU-ASLP System for Deepfake Algorithm Recognition in ADD 2023 Challenge",

abstract = "This paper describes our NPU-ASLP system for the Deepfake Algorithm Recognition (AR) task in the Audio Deepfake Detection 2023 Challenge. This task is an open-set classification problem focusing on identifying the specific algorithms used to create the deepfake speech utterances. In this task, we introduce a deepfake AR system with contributions in data augmentation, model architecture, fine-tuning strategy, and model ensemble. We first generate training data by applying various data augmentation techniques to the deepfake speech. We then utilize ResNet101 and a long-term temporal-frequency transformer module to better capture audio context dependencies. Moreover, we employ pre-trained WavLM for better feature extraction. Additionally, our content-invariant fine-tuning strategy improves performance. Finally, model ensemble with different representation combinations further enhances performance. Experiments show that our system achieves an F1-score of 0.7355 on the evaluation set, and ranks fourth in the challenge.",

keywords = "Deepfake algorithm recognition, data augmentation, model ensemble, transformer",

author = "Ziqian Wang and Qing Wang and Jixun Yao and Lei Xie",

year = "2023",

language = "英语",

volume = "3597",

pages = "64--69",

journal = "CEUR Workshop Proceedings",

issn = "1613-0073",

publisher = "CEUR-WS",

}

TY - JOUR

T1 - The NPU-ASLP System for Deepfake Algorithm Recognition in ADD 2023 Challenge

AU - Wang, Ziqian

AU - Wang, Qing

AU - Yao, Jixun

AU - Xie, Lei

PY - 2023

Y1 - 2023

N2 - This paper describes our NPU-ASLP system for the Deepfake Algorithm Recognition (AR) task in the Audio Deepfake Detection 2023 Challenge. This task is an open-set classification problem focusing on identifying the specific algorithms used to create the deepfake speech utterances. In this task, we introduce a deepfake AR system with contributions in data augmentation, model architecture, fine-tuning strategy, and model ensemble. We first generate training data by applying various data augmentation techniques to the deepfake speech. We then utilize ResNet101 and a long-term temporal-frequency transformer module to better capture audio context dependencies. Moreover, we employ pre-trained WavLM for better feature extraction. Additionally, our content-invariant fine-tuning strategy improves performance. Finally, model ensemble with different representation combinations further enhances performance. Experiments show that our system achieves an F1-score of 0.7355 on the evaluation set, and ranks fourth in the challenge.

AB - This paper describes our NPU-ASLP system for the Deepfake Algorithm Recognition (AR) task in the Audio Deepfake Detection 2023 Challenge. This task is an open-set classification problem focusing on identifying the specific algorithms used to create the deepfake speech utterances. In this task, we introduce a deepfake AR system with contributions in data augmentation, model architecture, fine-tuning strategy, and model ensemble. We first generate training data by applying various data augmentation techniques to the deepfake speech. We then utilize ResNet101 and a long-term temporal-frequency transformer module to better capture audio context dependencies. Moreover, we employ pre-trained WavLM for better feature extraction. Additionally, our content-invariant fine-tuning strategy improves performance. Finally, model ensemble with different representation combinations further enhances performance. Experiments show that our system achieves an F1-score of 0.7355 on the evaluation set, and ranks fourth in the challenge.

KW - Deepfake algorithm recognition

KW - data augmentation

KW - model ensemble

KW - transformer

UR - http://www.scopus.com/inward/record.url?scp=85181148066&partnerID=8YFLogxK

M3 - 会议文章

AN - SCOPUS:85181148066

SN - 1613-0073

VL - 3597

SP - 64

EP - 69

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

T2 - 2023 Workshop on Deepfake Audio Detection and Analysis, DADA 2023

Y2 - 19 August 2023

ER -

The NPU-ASLP System for Deepfake Algorithm Recognition in ADD 2023 Challenge

Abstract

Keywords

Other files and links

Fingerprint

Cite this