TY - GEN
T1 - F-T-LSTM based complex network for joint acoustic echo cancellation and speech enhancement
AU - Zhang, Shimin
AU - Kong, Yuxiang
AU - Lv, Shubo
AU - Hu, Yanxin
AU - Xie, Lei
N1 - Publisher Copyright:
Copyright © 2021 ISCA.
PY - 2021
Y1 - 2021
N2 - With the increasing demand for audio communication and online conference, ensuring the robustness of Acoustic Echo Cancellation (AEC) under the complicated acoustic scenario including noise, reverberation and nonlinear distortion has become a top issue. Although there have been some traditional methods that consider nonlinear distortion, they are still inefficient for echo suppression and the performance will be attenuated when noise is present. In this paper, we present a real-time AEC approach using complex neural network to better modeling the important phase information and frequency-time-LSTMs (F-TLSTM), which scan both frequency and time axis, for better temporal modeling. Moreover, we utilize modified SI-SNR as cost function to make the model to have better echo cancellation and noise suppression (NS) performance. With only 1.4M parameters, the proposed approach outperforms the AECchallenge baseline by 0.27 in terms of Mean Opinion Score (MOS).
AB - With the increasing demand for audio communication and online conference, ensuring the robustness of Acoustic Echo Cancellation (AEC) under the complicated acoustic scenario including noise, reverberation and nonlinear distortion has become a top issue. Although there have been some traditional methods that consider nonlinear distortion, they are still inefficient for echo suppression and the performance will be attenuated when noise is present. In this paper, we present a real-time AEC approach using complex neural network to better modeling the important phase information and frequency-time-LSTMs (F-TLSTM), which scan both frequency and time axis, for better temporal modeling. Moreover, we utilize modified SI-SNR as cost function to make the model to have better echo cancellation and noise suppression (NS) performance. With only 1.4M parameters, the proposed approach outperforms the AECchallenge baseline by 0.27 in terms of Mean Opinion Score (MOS).
KW - Acoustic echo cancellation
KW - Complex network
KW - Noise suppression
KW - Nonlinear distortion
UR - http://www.scopus.com/inward/record.url?scp=85119184364&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2021-1359
DO - 10.21437/Interspeech.2021-1359
M3 - 会议稿件
AN - SCOPUS:85119184364
T3 - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
SP - 791
EP - 795
BT - 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
PB - International Speech Communication Association
T2 - 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
Y2 - 30 August 2021 through 3 September 2021
ER -