Time-frequency dual-domain attention for acoustic echo cancellation

Yibo Huang, Weidong Qin, Zhiyong Li, Qiuyu Zhang

科研成果: 期刊稿件文章同行评审

摘要

Existing acoustic echo cancellation (AEC) technologies primarily focus on time-domain analysis, aiming to eliminate echo by modeling the long-range correlations of speech signals. However, these methods are limited in their ability to capture the dynamic variations in the frequency components of speech signals, thereby overlooking the significance of frequency-domain information. This paper proposes an energy distribution analysis method based on time-frequency (T-F) representation to address this issue. Introducing a dual-domain attention module (DDAM), which independently computes the local importance weights in both the frequency and time domains and multiplies these weights with the input features, accurately captures the most important time-frequency features of speech signals. In addition, the dual-domain feature enhancement block (DDFEB), which combines DDAM and convolutional layers, further enhances the multilevel representation of input features and integrates them into the encoder–decoder framework, effectively improving the representation of the time-frequency features. Experimental results show that the proposed method improves the perceptual evaluation of speech quality (PESQ) by 17.65% compared to the existing F-T-LSTM method and achieves a short-time objective intelligibility (STOI) score of 0.93. Furthermore, the proposed method increases the mean opinion score (MOS) by 0.33 compared to the existing DTLN-aec method, demonstrating its superiority in enhancing the user experience.

源语言英语
文章编号739
期刊Journal of Supercomputing
81
5
DOI
出版状态已出版 - 4月 2025
已对外发布

指纹

探究 'Time-frequency dual-domain attention for acoustic echo cancellation' 的科研主题。它们共同构成独一无二的指纹。

引用此