TEA-PSE 2.0: Sub-Band Network for Real-Time Personalized Speech Enhancement

Yukai Ju, Shimin Zhang, Wei Rao, Yannan Wang, Tao Yu, Lei Xie, Shidong Shang

科研成果: 书/报告/会议事项章节会议稿件同行评审

21 引用 (Scopus)

摘要

Personalized speech enhancement (PSE) utilizes additional cues like speaker embeddings to remove background noise and interfering speech and extract the speech from target speaker. Previous work, the Tencent-Ethereal-Audio-Lab personalized speech enhancement (TEA-PSE) system, ranked 1st in the ICASSP 2022 deep noise suppression (DNS2022) challenge. In this paper, we expand TEA-PSE to its sub-band version - TEA-PSE 2.0, to reduce computational complexity as well as further improve performance. Specifically, we adopt finite impulse response filter banks and spectrum splitting to reduce computational complexity. We introduce a time frequency convolution module (TFCM) to the system for increasing the receptive field with small convolution kernels. Besides, we explore several training strategies to optimize the two-stage network and investigate various loss functions in the PSE task. TEA-PSE 2.0 significantly outperforms TEA-PSE in both speech enhancement performance and computation complexity. Experimental results on the DNS2022 blind test set show that TEA-PSE 2.0 brings 0.102 OVRL personalized DNSMOS improvement with only 21.9% multiply-accumulate operations compared with the previous TEA-PSE.

源语言英语
主期刊名2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings
出版商Institute of Electrical and Electronics Engineers Inc.
472-479
页数8
ISBN(电子版)9798350396904
DOI
出版状态已出版 - 2023
活动2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Doha, 卡塔尔
期限: 9 1月 202312 1月 2023

出版系列

姓名2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings

会议

会议2022 IEEE Spoken Language Technology Workshop, SLT 2022
国家/地区卡塔尔
Doha
时期9/01/2312/01/23

指纹

探究 'TEA-PSE 2.0: Sub-Band Network for Real-Time Personalized Speech Enhancement' 的科研主题。它们共同构成独一无二的指纹。

引用此