Personalized Acoustic Echo Cancellation for Full-duplex Communications

Shimin Zhang; Ziteng Wang; Yukai Ju; Yihui Fu; Yueyue Na; Qiang Fu; Lei Xie

doi:10.21437/Interspeech.2022-10225

Personalized Acoustic Echo Cancellation for Full-duplex Communications

Shimin Zhang, Ziteng Wang, Yukai Ju, Yihui Fu, Yueyue Na, Qiang Fu, Lei Xie

计算机学院

Northwestern Polytechnical University Xian

科研成果: 期刊稿件 › 会议文章 › 同行评审

6 引用（Scopus）

摘要

Deep neural networks (DNNs) have shown promising results for acoustic echo cancellation (AEC). But the DNN-based AEC models let through all near-end speakers including the interfering speech. In light of recent studies on personalized speech enhancement, we investigate the feasibility of personalized acoustic echo cancellation (PAEC) in this paper for full-duplex communications, where background noise and interfering speakers may coexist with acoustic echoes. Specifically, we first propose a novel backbone neural network termed as gated temporal convolutional neural network (GTCNN) that outperforms state-of-the-art AEC models in performance. Speaker embeddings like d-vectors are further adopted as auxiliary information to guide the GTCNN to focus on the target speaker. A special case in PAEC is that speech snippets of both parties on the call are enrolled. Experimental results show that auxiliary information from either the near-end speaker or the far-end speaker can improve the DNN-based AEC performance. Nevertheless, there is still much room for improvement in the utilization of the finite-dimensional speaker embeddings.

源语言	英语
页（从-至）	2518-2522
页数	5
期刊	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
卷	2022-September
DOI	https://doi.org/10.21437/Interspeech.2022-10225
出版状态	已出版 - 2022
活动	23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022 - Incheon, 韩国期限: 18 9月 2022 → 22 9月 2022

访问文件

10.21437/Interspeech.2022-10225

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{15f2fe5a5a884fa0bea94b4fe4cc37c5,

title = "Personalized Acoustic Echo Cancellation for Full-duplex Communications",

abstract = "Deep neural networks (DNNs) have shown promising results for acoustic echo cancellation (AEC). But the DNN-based AEC models let through all near-end speakers including the interfering speech. In light of recent studies on personalized speech enhancement, we investigate the feasibility of personalized acoustic echo cancellation (PAEC) in this paper for full-duplex communications, where background noise and interfering speakers may coexist with acoustic echoes. Specifically, we first propose a novel backbone neural network termed as gated temporal convolutional neural network (GTCNN) that outperforms state-of-the-art AEC models in performance. Speaker embeddings like d-vectors are further adopted as auxiliary information to guide the GTCNN to focus on the target speaker. A special case in PAEC is that speech snippets of both parties on the call are enrolled. Experimental results show that auxiliary information from either the near-end speaker or the far-end speaker can improve the DNN-based AEC performance. Nevertheless, there is still much room for improvement in the utilization of the finite-dimensional speaker embeddings.",

keywords = "full-duplex communication, personalized acoustic echo cancellation, speaker embedding",

author = "Shimin Zhang and Ziteng Wang and Yukai Ju and Yihui Fu and Yueyue Na and Qiang Fu and Lei Xie",

note = "Publisher Copyright: Copyright {\textcopyright} 2022 ISCA.; 23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022 ; Conference date: 18-09-2022 Through 22-09-2022",

year = "2022",

doi = "10.21437/Interspeech.2022-10225",

language = "英语",

volume = "2022-September",

pages = "2518--2522",

journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

issn = "2308-457X",

}

TY - JOUR

T1 - Personalized Acoustic Echo Cancellation for Full-duplex Communications

AU - Zhang, Shimin

AU - Wang, Ziteng

AU - Ju, Yukai

AU - Fu, Yihui

AU - Na, Yueyue

AU - Fu, Qiang

AU - Xie, Lei

PY - 2022

Y1 - 2022

N2 - Deep neural networks (DNNs) have shown promising results for acoustic echo cancellation (AEC). But the DNN-based AEC models let through all near-end speakers including the interfering speech. In light of recent studies on personalized speech enhancement, we investigate the feasibility of personalized acoustic echo cancellation (PAEC) in this paper for full-duplex communications, where background noise and interfering speakers may coexist with acoustic echoes. Specifically, we first propose a novel backbone neural network termed as gated temporal convolutional neural network (GTCNN) that outperforms state-of-the-art AEC models in performance. Speaker embeddings like d-vectors are further adopted as auxiliary information to guide the GTCNN to focus on the target speaker. A special case in PAEC is that speech snippets of both parties on the call are enrolled. Experimental results show that auxiliary information from either the near-end speaker or the far-end speaker can improve the DNN-based AEC performance. Nevertheless, there is still much room for improvement in the utilization of the finite-dimensional speaker embeddings.

AB - Deep neural networks (DNNs) have shown promising results for acoustic echo cancellation (AEC). But the DNN-based AEC models let through all near-end speakers including the interfering speech. In light of recent studies on personalized speech enhancement, we investigate the feasibility of personalized acoustic echo cancellation (PAEC) in this paper for full-duplex communications, where background noise and interfering speakers may coexist with acoustic echoes. Specifically, we first propose a novel backbone neural network termed as gated temporal convolutional neural network (GTCNN) that outperforms state-of-the-art AEC models in performance. Speaker embeddings like d-vectors are further adopted as auxiliary information to guide the GTCNN to focus on the target speaker. A special case in PAEC is that speech snippets of both parties on the call are enrolled. Experimental results show that auxiliary information from either the near-end speaker or the far-end speaker can improve the DNN-based AEC performance. Nevertheless, there is still much room for improvement in the utilization of the finite-dimensional speaker embeddings.

KW - full-duplex communication

KW - personalized acoustic echo cancellation

KW - speaker embedding

UR - http://www.scopus.com/inward/record.url?scp=85140099059&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2022-10225

DO - 10.21437/Interspeech.2022-10225

M3 - 会议文章

AN - SCOPUS:85140099059

SN - 2308-457X

VL - 2022-September

SP - 2518

EP - 2522

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

T2 - 23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022

Y2 - 18 September 2022 through 22 September 2022

ER -

Personalized Acoustic Echo Cancellation for Full-duplex Communications

摘要

访问文件

其它文件与链接

指纹

引用此