F-T-LSTM based complex network for joint acoustic echo cancellation and speech enhancement

Shimin Zhang; Yuxiang Kong; Shubo Lv; Yanxin Hu; Lei Xie

doi:10.21437/Interspeech.2021-1359

F-T-LSTM based complex network for joint acoustic echo cancellation and speech enhancement

Shimin Zhang, Yuxiang Kong, Shubo Lv, Yanxin Hu, Lei Xie

School of Computer Science

Northwestern Polytechnical University Xian

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

10 Scopus citations

Abstract

With the increasing demand for audio communication and online conference, ensuring the robustness of Acoustic Echo Cancellation (AEC) under the complicated acoustic scenario including noise, reverberation and nonlinear distortion has become a top issue. Although there have been some traditional methods that consider nonlinear distortion, they are still inefficient for echo suppression and the performance will be attenuated when noise is present. In this paper, we present a real-time AEC approach using complex neural network to better modeling the important phase information and frequency-time-LSTMs (F-TLSTM), which scan both frequency and time axis, for better temporal modeling. Moreover, we utilize modified SI-SNR as cost function to make the model to have better echo cancellation and noise suppression (NS) performance. With only 1.4M parameters, the proposed approach outperforms the AECchallenge baseline by 0.27 in terms of Mean Opinion Score (MOS).

Original language	English
Title of host publication	22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
Publisher	International Speech Communication Association
Pages	791-795
Number of pages	5
ISBN (Electronic)	9781713836902
DOIs	https://doi.org/10.21437/Interspeech.2021-1359
State	Published - 2021
Event	22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 - Brno, Czech Republic Duration: 30 Aug 2021 → 3 Sep 2021

Publication series

Name	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume	2
ISSN (Print)	2308-457X
ISSN (Electronic)	1990-9772

Conference

Conference	22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
Country/Territory	Czech Republic
City	Brno
Period	30/08/21 → 3/09/21

Keywords

Acoustic echo cancellation
Complex network
Noise suppression
Nonlinear distortion

Access to Document

10.21437/Interspeech.2021-1359

Cite this

Zhang, S., Kong, Y., Lv, S., Hu, Y., & Xie, L. (2021). F-T-LSTM based complex network for joint acoustic echo cancellation and speech enhancement. In 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 (pp. 791-795). (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH; Vol. 2). International Speech Communication Association. https://doi.org/10.21437/Interspeech.2021-1359

Zhang, Shimin ; Kong, Yuxiang ; Lv, Shubo et al. / F-T-LSTM based complex network for joint acoustic echo cancellation and speech enhancement. 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021. International Speech Communication Association, 2021. pp. 791-795 (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH).

@inproceedings{eb550a1b6f4c47238ccc7b5276571e1f,

title = "F-T-LSTM based complex network for joint acoustic echo cancellation and speech enhancement",

abstract = "With the increasing demand for audio communication and online conference, ensuring the robustness of Acoustic Echo Cancellation (AEC) under the complicated acoustic scenario including noise, reverberation and nonlinear distortion has become a top issue. Although there have been some traditional methods that consider nonlinear distortion, they are still inefficient for echo suppression and the performance will be attenuated when noise is present. In this paper, we present a real-time AEC approach using complex neural network to better modeling the important phase information and frequency-time-LSTMs (F-TLSTM), which scan both frequency and time axis, for better temporal modeling. Moreover, we utilize modified SI-SNR as cost function to make the model to have better echo cancellation and noise suppression (NS) performance. With only 1.4M parameters, the proposed approach outperforms the AECchallenge baseline by 0.27 in terms of Mean Opinion Score (MOS).",

keywords = "Acoustic echo cancellation, Complex network, Noise suppression, Nonlinear distortion",

author = "Shimin Zhang and Yuxiang Kong and Shubo Lv and Yanxin Hu and Lei Xie",

note = "Publisher Copyright: Copyright {\textcopyright} 2021 ISCA.; 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 ; Conference date: 30-08-2021 Through 03-09-2021",

year = "2021",

doi = "10.21437/Interspeech.2021-1359",

language = "英语",

series = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

publisher = "International Speech Communication Association",

pages = "791--795",

booktitle = "22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021",

}

Zhang, S, Kong, Y, Lv, S, Hu, Y & Xie, L 2021, F-T-LSTM based complex network for joint acoustic echo cancellation and speech enhancement. in 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, vol. 2, International Speech Communication Association, pp. 791-795, 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021, Brno, Czech Republic, 30/08/21. https://doi.org/10.21437/Interspeech.2021-1359

F-T-LSTM based complex network for joint acoustic echo cancellation and speech enhancement. / Zhang, Shimin; Kong, Yuxiang; Lv, Shubo et al.
22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021. International Speech Communication Association, 2021. p. 791-795 (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH; Vol. 2).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - F-T-LSTM based complex network for joint acoustic echo cancellation and speech enhancement

AU - Zhang, Shimin

AU - Kong, Yuxiang

AU - Lv, Shubo

AU - Hu, Yanxin

AU - Xie, Lei

PY - 2021

Y1 - 2021

N2 - With the increasing demand for audio communication and online conference, ensuring the robustness of Acoustic Echo Cancellation (AEC) under the complicated acoustic scenario including noise, reverberation and nonlinear distortion has become a top issue. Although there have been some traditional methods that consider nonlinear distortion, they are still inefficient for echo suppression and the performance will be attenuated when noise is present. In this paper, we present a real-time AEC approach using complex neural network to better modeling the important phase information and frequency-time-LSTMs (F-TLSTM), which scan both frequency and time axis, for better temporal modeling. Moreover, we utilize modified SI-SNR as cost function to make the model to have better echo cancellation and noise suppression (NS) performance. With only 1.4M parameters, the proposed approach outperforms the AECchallenge baseline by 0.27 in terms of Mean Opinion Score (MOS).

AB - With the increasing demand for audio communication and online conference, ensuring the robustness of Acoustic Echo Cancellation (AEC) under the complicated acoustic scenario including noise, reverberation and nonlinear distortion has become a top issue. Although there have been some traditional methods that consider nonlinear distortion, they are still inefficient for echo suppression and the performance will be attenuated when noise is present. In this paper, we present a real-time AEC approach using complex neural network to better modeling the important phase information and frequency-time-LSTMs (F-TLSTM), which scan both frequency and time axis, for better temporal modeling. Moreover, we utilize modified SI-SNR as cost function to make the model to have better echo cancellation and noise suppression (NS) performance. With only 1.4M parameters, the proposed approach outperforms the AECchallenge baseline by 0.27 in terms of Mean Opinion Score (MOS).

KW - Acoustic echo cancellation

KW - Complex network

KW - Noise suppression

KW - Nonlinear distortion

UR - http://www.scopus.com/inward/record.url?scp=85119184364&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2021-1359

DO - 10.21437/Interspeech.2021-1359

M3 - 会议稿件

AN - SCOPUS:85119184364

T3 - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SP - 791

EP - 795

BT - 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021

PB - International Speech Communication Association

T2 - 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021

Y2 - 30 August 2021 through 3 September 2021

ER -

Zhang S, Kong Y, Lv S, Hu Y, Xie L. F-T-LSTM based complex network for joint acoustic echo cancellation and speech enhancement. In 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021. International Speech Communication Association. 2021. p. 791-795. (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH). doi: 10.21437/Interspeech.2021-1359

F-T-LSTM based complex network for joint acoustic echo cancellation and speech enhancement

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this