TEA-PSE 2.0: Sub-Band Network for Real-Time Personalized Speech Enhancement

Yukai Ju; Shimin Zhang; Wei Rao; Yannan Wang; Tao Yu; Lei Xie; Shidong Shang

doi:10.1109/SLT54892.2023.10023174

TEA-PSE 2.0: Sub-Band Network for Real-Time Personalized Speech Enhancement

Yukai Ju, Shimin Zhang, Wei Rao, Yannan Wang, Tao Yu, Lei Xie, Shidong Shang

School of Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

21 Scopus citations

Abstract

Personalized speech enhancement (PSE) utilizes additional cues like speaker embeddings to remove background noise and interfering speech and extract the speech from target speaker. Previous work, the Tencent-Ethereal-Audio-Lab personalized speech enhancement (TEA-PSE) system, ranked 1st in the ICASSP 2022 deep noise suppression (DNS2022) challenge. In this paper, we expand TEA-PSE to its sub-band version - TEA-PSE 2.0, to reduce computational complexity as well as further improve performance. Specifically, we adopt finite impulse response filter banks and spectrum splitting to reduce computational complexity. We introduce a time frequency convolution module (TFCM) to the system for increasing the receptive field with small convolution kernels. Besides, we explore several training strategies to optimize the two-stage network and investigate various loss functions in the PSE task. TEA-PSE 2.0 significantly outperforms TEA-PSE in both speech enhancement performance and computation complexity. Experimental results on the DNS2022 blind test set show that TEA-PSE 2.0 brings 0.102 OVRL personalized DNSMOS improvement with only 21.9% multiply-accumulate operations compared with the previous TEA-PSE.

Original language	English
Title of host publication	2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	472-479
Number of pages	8
ISBN (Electronic)	9798350396904
DOIs	https://doi.org/10.1109/SLT54892.2023.10023174
State	Published - 2023
Event	2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Doha, Qatar Duration: 9 Jan 2023 → 12 Jan 2023

Publication series

Name	2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings

Conference

Conference	2022 IEEE Spoken Language Technology Workshop, SLT 2022
Country/Territory	Qatar
City	Doha
Period	9/01/23 → 12/01/23

Keywords

deep learning
personalized speech enhancement
real-time
sub-band

Access to Document

10.1109/SLT54892.2023.10023174

Cite this

Ju, Y., Zhang, S., Rao, W., Wang, Y., Yu, T., Xie, L., & Shang, S. (2023). TEA-PSE 2.0: Sub-Band Network for Real-Time Personalized Speech Enhancement. In 2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings (pp. 472-479). (2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/SLT54892.2023.10023174

@inproceedings{dd6bfb4d5c8844eb96aa5be40847f14f,

title = "TEA-PSE 2.0: Sub-Band Network for Real-Time Personalized Speech Enhancement",

abstract = "Personalized speech enhancement (PSE) utilizes additional cues like speaker embeddings to remove background noise and interfering speech and extract the speech from target speaker. Previous work, the Tencent-Ethereal-Audio-Lab personalized speech enhancement (TEA-PSE) system, ranked 1st in the ICASSP 2022 deep noise suppression (DNS2022) challenge. In this paper, we expand TEA-PSE to its sub-band version - TEA-PSE 2.0, to reduce computational complexity as well as further improve performance. Specifically, we adopt finite impulse response filter banks and spectrum splitting to reduce computational complexity. We introduce a time frequency convolution module (TFCM) to the system for increasing the receptive field with small convolution kernels. Besides, we explore several training strategies to optimize the two-stage network and investigate various loss functions in the PSE task. TEA-PSE 2.0 significantly outperforms TEA-PSE in both speech enhancement performance and computation complexity. Experimental results on the DNS2022 blind test set show that TEA-PSE 2.0 brings 0.102 OVRL personalized DNSMOS improvement with only 21.9% multiply-accumulate operations compared with the previous TEA-PSE.",

keywords = "deep learning, personalized speech enhancement, real-time, sub-band",

author = "Yukai Ju and Shimin Zhang and Wei Rao and Yannan Wang and Tao Yu and Lei Xie and Shidong Shang",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 2022 IEEE Spoken Language Technology Workshop, SLT 2022 ; Conference date: 09-01-2023 Through 12-01-2023",

year = "2023",

doi = "10.1109/SLT54892.2023.10023174",

language = "英语",

series = "2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "472--479",

booktitle = "2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings",

}

Ju, Y, Zhang, S, Rao, W, Wang, Y, Yu, T, Xie, L & Shang, S 2023, TEA-PSE 2.0: Sub-Band Network for Real-Time Personalized Speech Enhancement. in 2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings. 2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings, Institute of Electrical and Electronics Engineers Inc., pp. 472-479, 2022 IEEE Spoken Language Technology Workshop, SLT 2022, Doha, Qatar, 9/01/23. https://doi.org/10.1109/SLT54892.2023.10023174

TEA-PSE 2.0: Sub-Band Network for Real-Time Personalized Speech Enhancement. / Ju, Yukai; Zhang, Shimin; Rao, Wei et al.
2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2023. p. 472-479 (2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - TEA-PSE 2.0

T2 - 2022 IEEE Spoken Language Technology Workshop, SLT 2022

AU - Ju, Yukai

AU - Zhang, Shimin

AU - Rao, Wei

AU - Wang, Yannan

AU - Yu, Tao

AU - Xie, Lei

AU - Shang, Shidong

PY - 2023

Y1 - 2023

N2 - Personalized speech enhancement (PSE) utilizes additional cues like speaker embeddings to remove background noise and interfering speech and extract the speech from target speaker. Previous work, the Tencent-Ethereal-Audio-Lab personalized speech enhancement (TEA-PSE) system, ranked 1st in the ICASSP 2022 deep noise suppression (DNS2022) challenge. In this paper, we expand TEA-PSE to its sub-band version - TEA-PSE 2.0, to reduce computational complexity as well as further improve performance. Specifically, we adopt finite impulse response filter banks and spectrum splitting to reduce computational complexity. We introduce a time frequency convolution module (TFCM) to the system for increasing the receptive field with small convolution kernels. Besides, we explore several training strategies to optimize the two-stage network and investigate various loss functions in the PSE task. TEA-PSE 2.0 significantly outperforms TEA-PSE in both speech enhancement performance and computation complexity. Experimental results on the DNS2022 blind test set show that TEA-PSE 2.0 brings 0.102 OVRL personalized DNSMOS improvement with only 21.9% multiply-accumulate operations compared with the previous TEA-PSE.

AB - Personalized speech enhancement (PSE) utilizes additional cues like speaker embeddings to remove background noise and interfering speech and extract the speech from target speaker. Previous work, the Tencent-Ethereal-Audio-Lab personalized speech enhancement (TEA-PSE) system, ranked 1st in the ICASSP 2022 deep noise suppression (DNS2022) challenge. In this paper, we expand TEA-PSE to its sub-band version - TEA-PSE 2.0, to reduce computational complexity as well as further improve performance. Specifically, we adopt finite impulse response filter banks and spectrum splitting to reduce computational complexity. We introduce a time frequency convolution module (TFCM) to the system for increasing the receptive field with small convolution kernels. Besides, we explore several training strategies to optimize the two-stage network and investigate various loss functions in the PSE task. TEA-PSE 2.0 significantly outperforms TEA-PSE in both speech enhancement performance and computation complexity. Experimental results on the DNS2022 blind test set show that TEA-PSE 2.0 brings 0.102 OVRL personalized DNSMOS improvement with only 21.9% multiply-accumulate operations compared with the previous TEA-PSE.

KW - deep learning

KW - personalized speech enhancement

KW - real-time

KW - sub-band

UR - http://www.scopus.com/inward/record.url?scp=85147794013&partnerID=8YFLogxK

U2 - 10.1109/SLT54892.2023.10023174

DO - 10.1109/SLT54892.2023.10023174

M3 - 会议稿件

AN - SCOPUS:85147794013

T3 - 2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings

SP - 472

EP - 479

BT - 2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 9 January 2023 through 12 January 2023

ER -

Ju Y, Zhang S, Rao W, Wang Y, Yu T, Xie L et al. TEA-PSE 2.0: Sub-Band Network for Real-Time Personalized Speech Enhancement. In 2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2023. p. 472-479. (2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings). doi: 10.1109/SLT54892.2023.10023174

TEA-PSE 2.0: Sub-Band Network for Real-Time Personalized Speech Enhancement

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this