The NPU-ASLP System for Deepfake Algorithm Recognition in ADD 2023 Challenge

Ziqian Wang; Qing Wang; Jixun Yao; Lei Xie

The NPU-ASLP System for Deepfake Algorithm Recognition in ADD 2023 Challenge

Ziqian Wang, Qing Wang, Jixun Yao, Lei Xie

计算机学院

Northwestern Polytechnical University Xian

科研成果: 期刊稿件 › 会议文章 › 同行评审

1 引用（Scopus）

摘要

This paper describes our NPU-ASLP system for the Deepfake Algorithm Recognition (AR) task in the Audio Deepfake Detection 2023 Challenge. This task is an open-set classification problem focusing on identifying the specific algorithms used to create the deepfake speech utterances. In this task, we introduce a deepfake AR system with contributions in data augmentation, model architecture, fine-tuning strategy, and model ensemble. We first generate training data by applying various data augmentation techniques to the deepfake speech. We then utilize ResNet101 and a long-term temporal-frequency transformer module to better capture audio context dependencies. Moreover, we employ pre-trained WavLM for better feature extraction. Additionally, our content-invariant fine-tuning strategy improves performance. Finally, model ensemble with different representation combinations further enhances performance. Experiments show that our system achieves an F1-score of 0.7355 on the evaluation set, and ranks fourth in the challenge.

源语言	英语
页（从-至）	64-69
页数	6
期刊	CEUR Workshop Proceedings
卷	3597
出版状态	已出版 - 2023
活动	2023 Workshop on Deepfake Audio Detection and Analysis, DADA 2023 - Macao, 中国期限: 19 8月 2023 → …

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{5dd963c28f0443f693b4c12bfd8aae6d,

title = "The NPU-ASLP System for Deepfake Algorithm Recognition in ADD 2023 Challenge",

abstract = "This paper describes our NPU-ASLP system for the Deepfake Algorithm Recognition (AR) task in the Audio Deepfake Detection 2023 Challenge. This task is an open-set classification problem focusing on identifying the specific algorithms used to create the deepfake speech utterances. In this task, we introduce a deepfake AR system with contributions in data augmentation, model architecture, fine-tuning strategy, and model ensemble. We first generate training data by applying various data augmentation techniques to the deepfake speech. We then utilize ResNet101 and a long-term temporal-frequency transformer module to better capture audio context dependencies. Moreover, we employ pre-trained WavLM for better feature extraction. Additionally, our content-invariant fine-tuning strategy improves performance. Finally, model ensemble with different representation combinations further enhances performance. Experiments show that our system achieves an F1-score of 0.7355 on the evaluation set, and ranks fourth in the challenge.",

keywords = "Deepfake algorithm recognition, data augmentation, model ensemble, transformer",

author = "Ziqian Wang and Qing Wang and Jixun Yao and Lei Xie",

year = "2023",

language = "英语",

volume = "3597",

pages = "64--69",

journal = "CEUR Workshop Proceedings",

issn = "1613-0073",

publisher = "CEUR-WS",

}

TY - JOUR

T1 - The NPU-ASLP System for Deepfake Algorithm Recognition in ADD 2023 Challenge

AU - Wang, Ziqian

AU - Wang, Qing

AU - Yao, Jixun

AU - Xie, Lei

PY - 2023

Y1 - 2023

N2 - This paper describes our NPU-ASLP system for the Deepfake Algorithm Recognition (AR) task in the Audio Deepfake Detection 2023 Challenge. This task is an open-set classification problem focusing on identifying the specific algorithms used to create the deepfake speech utterances. In this task, we introduce a deepfake AR system with contributions in data augmentation, model architecture, fine-tuning strategy, and model ensemble. We first generate training data by applying various data augmentation techniques to the deepfake speech. We then utilize ResNet101 and a long-term temporal-frequency transformer module to better capture audio context dependencies. Moreover, we employ pre-trained WavLM for better feature extraction. Additionally, our content-invariant fine-tuning strategy improves performance. Finally, model ensemble with different representation combinations further enhances performance. Experiments show that our system achieves an F1-score of 0.7355 on the evaluation set, and ranks fourth in the challenge.

AB - This paper describes our NPU-ASLP system for the Deepfake Algorithm Recognition (AR) task in the Audio Deepfake Detection 2023 Challenge. This task is an open-set classification problem focusing on identifying the specific algorithms used to create the deepfake speech utterances. In this task, we introduce a deepfake AR system with contributions in data augmentation, model architecture, fine-tuning strategy, and model ensemble. We first generate training data by applying various data augmentation techniques to the deepfake speech. We then utilize ResNet101 and a long-term temporal-frequency transformer module to better capture audio context dependencies. Moreover, we employ pre-trained WavLM for better feature extraction. Additionally, our content-invariant fine-tuning strategy improves performance. Finally, model ensemble with different representation combinations further enhances performance. Experiments show that our system achieves an F1-score of 0.7355 on the evaluation set, and ranks fourth in the challenge.

KW - Deepfake algorithm recognition

KW - data augmentation

KW - model ensemble

KW - transformer

UR - http://www.scopus.com/inward/record.url?scp=85181148066&partnerID=8YFLogxK

M3 - 会议文章

AN - SCOPUS:85181148066

SN - 1613-0073

VL - 3597

SP - 64

EP - 69

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

T2 - 2023 Workshop on Deepfake Audio Detection and Analysis, DADA 2023

Y2 - 19 August 2023

ER -

The NPU-ASLP System for Deepfake Algorithm Recognition in ADD 2023 Challenge

摘要

其它文件与链接

指纹

引用此