MULTI-TASK DEEP RESIDUAL ECHO SUPPRESSION WITH ECHO-AWARE LOSS

Shimin Zhang; Ziteng Wang; Jiayao Sun; Yihui Fu; Biao Tian; Qiang Fu; Lei Xie

doi:10.1109/ICASSP43922.2022.9746733

MULTI-TASK DEEP RESIDUAL ECHO SUPPRESSION WITH ECHO-AWARE LOSS

Shimin Zhang, Ziteng Wang, Jiayao Sun, Yihui Fu, Biao Tian, Qiang Fu, Lei Xie

School of Computer Science

Northwestern Polytechnical University Xian

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

33 Scopus citations

Abstract

This paper introduces the NWPU Team's entry to the ICASSP 2022 AEC Challenge. We take a hybrid approach that cascades a linear AEC with a neural post-filter. The former is used to deal with the linear echo components while the latter suppresses the residual non-linear echo components. We use gated convolutional F-T-LSTM neural network (GFTNN) as the backbone and shape the post-filter by a multi-task learning (MTL) framework, where a voice activity detection (VAD) module is adopted as an auxiliary task along with echo suppression, with the aim to avoid over suppression that may cause speech distortion. Moreover, we adopt an echo-aware loss function, where the mean square error (MSE) loss can be optimized particularly for every time-frequency bin (TF-bin) according to the signal-to-echo ratio (SER), leading to further suppression on the echo. Extensive ablation study shows that the time delay estimation (TDE) module in neural post-filter leads to better perceptual quality, and an adaptive filter with better convergence will bring consistent performance gain for the post-filter. Besides, we find that using the linear echo as the input of our neural post-filter is a better choice than using the reference signal directly. In the ICASSP 2022 AEC-Challenge, our approach has ranked the 1st place on word accuracy (WAcc) (0.817) and the 3rd place on both mean opinion score (MOS) (4.502) and the final score (0.864).

Original language	English
Title of host publication	2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	9127-9131
Number of pages	5
ISBN (Electronic)	9781665405409
DOIs	https://doi.org/10.1109/ICASSP43922.2022.9746733
State	Published - 2022
Event	2022 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022 - Hybrid, Singapore Duration: 22 May 2022 → 27 May 2022

Publication series

Name	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume	2022-May
ISSN (Print)	1520-6149

Conference

Conference	2022 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022
Country/Territory	Singapore
City	Hybrid
Period	22/05/22 → 27/05/22

Keywords

Acoustic echo cancellation
multi-task learning
noise suppression

Access to Document

10.1109/ICASSP43922.2022.9746733

Cite this

Zhang, S., Wang, Z., Sun, J., Fu, Y., Tian, B., Fu, Q., & Xie, L. (2022). MULTI-TASK DEEP RESIDUAL ECHO SUPPRESSION WITH ECHO-AWARE LOSS. In 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings (pp. 9127-9131). (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2022-May). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP43922.2022.9746733

Zhang, Shimin ; Wang, Ziteng ; Sun, Jiayao et al. / MULTI-TASK DEEP RESIDUAL ECHO SUPPRESSION WITH ECHO-AWARE LOSS. 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2022. pp. 9127-9131 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

@inproceedings{1506cbba78074ec3aa14439162691dcb,

title = "MULTI-TASK DEEP RESIDUAL ECHO SUPPRESSION WITH ECHO-AWARE LOSS",

abstract = "This paper introduces the NWPU Team's entry to the ICASSP 2022 AEC Challenge. We take a hybrid approach that cascades a linear AEC with a neural post-filter. The former is used to deal with the linear echo components while the latter suppresses the residual non-linear echo components. We use gated convolutional F-T-LSTM neural network (GFTNN) as the backbone and shape the post-filter by a multi-task learning (MTL) framework, where a voice activity detection (VAD) module is adopted as an auxiliary task along with echo suppression, with the aim to avoid over suppression that may cause speech distortion. Moreover, we adopt an echo-aware loss function, where the mean square error (MSE) loss can be optimized particularly for every time-frequency bin (TF-bin) according to the signal-to-echo ratio (SER), leading to further suppression on the echo. Extensive ablation study shows that the time delay estimation (TDE) module in neural post-filter leads to better perceptual quality, and an adaptive filter with better convergence will bring consistent performance gain for the post-filter. Besides, we find that using the linear echo as the input of our neural post-filter is a better choice than using the reference signal directly. In the ICASSP 2022 AEC-Challenge, our approach has ranked the 1st place on word accuracy (WAcc) (0.817) and the 3rd place on both mean opinion score (MOS) (4.502) and the final score (0.864).",

keywords = "Acoustic echo cancellation, multi-task learning, noise suppression",

author = "Shimin Zhang and Ziteng Wang and Jiayao Sun and Yihui Fu and Biao Tian and Qiang Fu and Lei Xie",

note = "Publisher Copyright: {\textcopyright} 2022 IEEE; 2022 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022 ; Conference date: 22-05-2022 Through 27-05-2022",

year = "2022",

doi = "10.1109/ICASSP43922.2022.9746733",

language = "英语",

series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "9127--9131",

booktitle = "2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings",

}

Zhang, S, Wang, Z, Sun, J, Fu, Y, Tian, B, Fu, Q & Xie, L 2022, MULTI-TASK DEEP RESIDUAL ECHO SUPPRESSION WITH ECHO-AWARE LOSS. in 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 2022-May, Institute of Electrical and Electronics Engineers Inc., pp. 9127-9131, 2022 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Hybrid, Singapore, 22/05/22. https://doi.org/10.1109/ICASSP43922.2022.9746733

MULTI-TASK DEEP RESIDUAL ECHO SUPPRESSION WITH ECHO-AWARE LOSS. / Zhang, Shimin; Wang, Ziteng; Sun, Jiayao et al.
2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2022. p. 9127-9131 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2022-May).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - MULTI-TASK DEEP RESIDUAL ECHO SUPPRESSION WITH ECHO-AWARE LOSS

AU - Zhang, Shimin

AU - Wang, Ziteng

AU - Sun, Jiayao

AU - Fu, Yihui

AU - Tian, Biao

AU - Fu, Qiang

AU - Xie, Lei

PY - 2022

Y1 - 2022

N2 - This paper introduces the NWPU Team's entry to the ICASSP 2022 AEC Challenge. We take a hybrid approach that cascades a linear AEC with a neural post-filter. The former is used to deal with the linear echo components while the latter suppresses the residual non-linear echo components. We use gated convolutional F-T-LSTM neural network (GFTNN) as the backbone and shape the post-filter by a multi-task learning (MTL) framework, where a voice activity detection (VAD) module is adopted as an auxiliary task along with echo suppression, with the aim to avoid over suppression that may cause speech distortion. Moreover, we adopt an echo-aware loss function, where the mean square error (MSE) loss can be optimized particularly for every time-frequency bin (TF-bin) according to the signal-to-echo ratio (SER), leading to further suppression on the echo. Extensive ablation study shows that the time delay estimation (TDE) module in neural post-filter leads to better perceptual quality, and an adaptive filter with better convergence will bring consistent performance gain for the post-filter. Besides, we find that using the linear echo as the input of our neural post-filter is a better choice than using the reference signal directly. In the ICASSP 2022 AEC-Challenge, our approach has ranked the 1st place on word accuracy (WAcc) (0.817) and the 3rd place on both mean opinion score (MOS) (4.502) and the final score (0.864).

AB - This paper introduces the NWPU Team's entry to the ICASSP 2022 AEC Challenge. We take a hybrid approach that cascades a linear AEC with a neural post-filter. The former is used to deal with the linear echo components while the latter suppresses the residual non-linear echo components. We use gated convolutional F-T-LSTM neural network (GFTNN) as the backbone and shape the post-filter by a multi-task learning (MTL) framework, where a voice activity detection (VAD) module is adopted as an auxiliary task along with echo suppression, with the aim to avoid over suppression that may cause speech distortion. Moreover, we adopt an echo-aware loss function, where the mean square error (MSE) loss can be optimized particularly for every time-frequency bin (TF-bin) according to the signal-to-echo ratio (SER), leading to further suppression on the echo. Extensive ablation study shows that the time delay estimation (TDE) module in neural post-filter leads to better perceptual quality, and an adaptive filter with better convergence will bring consistent performance gain for the post-filter. Besides, we find that using the linear echo as the input of our neural post-filter is a better choice than using the reference signal directly. In the ICASSP 2022 AEC-Challenge, our approach has ranked the 1st place on word accuracy (WAcc) (0.817) and the 3rd place on both mean opinion score (MOS) (4.502) and the final score (0.864).

KW - Acoustic echo cancellation

KW - multi-task learning

KW - noise suppression

UR - http://www.scopus.com/inward/record.url?scp=85126778676&partnerID=8YFLogxK

U2 - 10.1109/ICASSP43922.2022.9746733

DO - 10.1109/ICASSP43922.2022.9746733

M3 - 会议稿件

AN - SCOPUS:85126778676

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

SP - 9127

EP - 9131

BT - 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022

Y2 - 22 May 2022 through 27 May 2022

ER -

Zhang S, Wang Z, Sun J, Fu Y, Tian B, Fu Q et al. MULTI-TASK DEEP RESIDUAL ECHO SUPPRESSION WITH ECHO-AWARE LOSS. In 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2022. p. 9127-9131. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). doi: 10.1109/ICASSP43922.2022.9746733

MULTI-TASK DEEP RESIDUAL ECHO SUPPRESSION WITH ECHO-AWARE LOSS

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this