Personalized Acoustic Echo Cancellation for Full-duplex Communications

Shimin Zhang; Ziteng Wang; Yukai Ju; Yihui Fu; Yueyue Na; Qiang Fu; Lei Xie

doi:10.21437/Interspeech.2022-10225

Personalized Acoustic Echo Cancellation for Full-duplex Communications

Shimin Zhang, Ziteng Wang, Yukai Ju, Yihui Fu, Yueyue Na, Qiang Fu, Lei Xie

School of Computer Science

Northwestern Polytechnical University Xian

Research output: Contribution to journal › Conference article › peer-review

6 Scopus citations

Abstract

Deep neural networks (DNNs) have shown promising results for acoustic echo cancellation (AEC). But the DNN-based AEC models let through all near-end speakers including the interfering speech. In light of recent studies on personalized speech enhancement, we investigate the feasibility of personalized acoustic echo cancellation (PAEC) in this paper for full-duplex communications, where background noise and interfering speakers may coexist with acoustic echoes. Specifically, we first propose a novel backbone neural network termed as gated temporal convolutional neural network (GTCNN) that outperforms state-of-the-art AEC models in performance. Speaker embeddings like d-vectors are further adopted as auxiliary information to guide the GTCNN to focus on the target speaker. A special case in PAEC is that speech snippets of both parties on the call are enrolled. Experimental results show that auxiliary information from either the near-end speaker or the far-end speaker can improve the DNN-based AEC performance. Nevertheless, there is still much room for improvement in the utilization of the finite-dimensional speaker embeddings.

Original language	English
Pages (from-to)	2518-2522
Number of pages	5
Journal	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume	2022-September
DOIs	https://doi.org/10.21437/Interspeech.2022-10225
State	Published - 2022
Event	23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022 - Incheon, Korea, Republic of Duration: 18 Sep 2022 → 22 Sep 2022

Keywords

full-duplex communication
personalized acoustic echo cancellation
speaker embedding

Access to Document

10.21437/Interspeech.2022-10225

Cite this

@article{15f2fe5a5a884fa0bea94b4fe4cc37c5,

title = "Personalized Acoustic Echo Cancellation for Full-duplex Communications",

abstract = "Deep neural networks (DNNs) have shown promising results for acoustic echo cancellation (AEC). But the DNN-based AEC models let through all near-end speakers including the interfering speech. In light of recent studies on personalized speech enhancement, we investigate the feasibility of personalized acoustic echo cancellation (PAEC) in this paper for full-duplex communications, where background noise and interfering speakers may coexist with acoustic echoes. Specifically, we first propose a novel backbone neural network termed as gated temporal convolutional neural network (GTCNN) that outperforms state-of-the-art AEC models in performance. Speaker embeddings like d-vectors are further adopted as auxiliary information to guide the GTCNN to focus on the target speaker. A special case in PAEC is that speech snippets of both parties on the call are enrolled. Experimental results show that auxiliary information from either the near-end speaker or the far-end speaker can improve the DNN-based AEC performance. Nevertheless, there is still much room for improvement in the utilization of the finite-dimensional speaker embeddings.",

keywords = "full-duplex communication, personalized acoustic echo cancellation, speaker embedding",

author = "Shimin Zhang and Ziteng Wang and Yukai Ju and Yihui Fu and Yueyue Na and Qiang Fu and Lei Xie",

note = "Publisher Copyright: Copyright {\textcopyright} 2022 ISCA.; 23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022 ; Conference date: 18-09-2022 Through 22-09-2022",

year = "2022",

doi = "10.21437/Interspeech.2022-10225",

language = "英语",

volume = "2022-September",

pages = "2518--2522",

journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

issn = "2308-457X",

}

TY - JOUR

T1 - Personalized Acoustic Echo Cancellation for Full-duplex Communications

AU - Zhang, Shimin

AU - Wang, Ziteng

AU - Ju, Yukai

AU - Fu, Yihui

AU - Na, Yueyue

AU - Fu, Qiang

AU - Xie, Lei

PY - 2022

Y1 - 2022

N2 - Deep neural networks (DNNs) have shown promising results for acoustic echo cancellation (AEC). But the DNN-based AEC models let through all near-end speakers including the interfering speech. In light of recent studies on personalized speech enhancement, we investigate the feasibility of personalized acoustic echo cancellation (PAEC) in this paper for full-duplex communications, where background noise and interfering speakers may coexist with acoustic echoes. Specifically, we first propose a novel backbone neural network termed as gated temporal convolutional neural network (GTCNN) that outperforms state-of-the-art AEC models in performance. Speaker embeddings like d-vectors are further adopted as auxiliary information to guide the GTCNN to focus on the target speaker. A special case in PAEC is that speech snippets of both parties on the call are enrolled. Experimental results show that auxiliary information from either the near-end speaker or the far-end speaker can improve the DNN-based AEC performance. Nevertheless, there is still much room for improvement in the utilization of the finite-dimensional speaker embeddings.

AB - Deep neural networks (DNNs) have shown promising results for acoustic echo cancellation (AEC). But the DNN-based AEC models let through all near-end speakers including the interfering speech. In light of recent studies on personalized speech enhancement, we investigate the feasibility of personalized acoustic echo cancellation (PAEC) in this paper for full-duplex communications, where background noise and interfering speakers may coexist with acoustic echoes. Specifically, we first propose a novel backbone neural network termed as gated temporal convolutional neural network (GTCNN) that outperforms state-of-the-art AEC models in performance. Speaker embeddings like d-vectors are further adopted as auxiliary information to guide the GTCNN to focus on the target speaker. A special case in PAEC is that speech snippets of both parties on the call are enrolled. Experimental results show that auxiliary information from either the near-end speaker or the far-end speaker can improve the DNN-based AEC performance. Nevertheless, there is still much room for improvement in the utilization of the finite-dimensional speaker embeddings.

KW - full-duplex communication

KW - personalized acoustic echo cancellation

KW - speaker embedding

UR - http://www.scopus.com/inward/record.url?scp=85140099059&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2022-10225

DO - 10.21437/Interspeech.2022-10225

M3 - 会议文章

AN - SCOPUS:85140099059

SN - 2308-457X

VL - 2022-September

SP - 2518

EP - 2522

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

T2 - 23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022

Y2 - 18 September 2022 through 22 September 2022

ER -

Personalized Acoustic Echo Cancellation for Full-duplex Communications

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this