Time-frequency dual-domain attention for acoustic echo cancellation

Yibo Huang; Weidong Qin; Zhiyong Li; Qiuyu Zhang

doi:10.1007/s11227-025-07200-2

Time-frequency dual-domain attention for acoustic echo cancellation

Yibo Huang, Weidong Qin, Zhiyong Li, Qiuyu Zhang

Research output: Contribution to journal › Article › peer-review

Abstract

Existing acoustic echo cancellation (AEC) technologies primarily focus on time-domain analysis, aiming to eliminate echo by modeling the long-range correlations of speech signals. However, these methods are limited in their ability to capture the dynamic variations in the frequency components of speech signals, thereby overlooking the significance of frequency-domain information. This paper proposes an energy distribution analysis method based on time-frequency (T-F) representation to address this issue. Introducing a dual-domain attention module (DDAM), which independently computes the local importance weights in both the frequency and time domains and multiplies these weights with the input features, accurately captures the most important time-frequency features of speech signals. In addition, the dual-domain feature enhancement block (DDFEB), which combines DDAM and convolutional layers, further enhances the multilevel representation of input features and integrates them into the encoder–decoder framework, effectively improving the representation of the time-frequency features. Experimental results show that the proposed method improves the perceptual evaluation of speech quality (PESQ) by 17.65% compared to the existing F-T-LSTM method and achieves a short-time objective intelligibility (STOI) score of 0.93. Furthermore, the proposed method increases the mean opinion score (MOS) by 0.33 compared to the existing DTLN-aec method, demonstrating its superiority in enhancing the user experience.

Original language	English
Article number	739
Journal	Journal of Supercomputing
Volume	81
Issue number	5
DOIs	https://doi.org/10.1007/s11227-025-07200-2
State	Published - Apr 2025
Externally published	Yes

Keywords

Acoustic echo cancellation
Dual-domain feature enhancement
Energy distribution
Speech quality assessment
Time-frequency dual-domain attention

Access to Document

10.1007/s11227-025-07200-2

Cite this

@article{450a79fe2b604d4ea445544964f19b0b,

title = "Time-frequency dual-domain attention for acoustic echo cancellation",

abstract = "Existing acoustic echo cancellation (AEC) technologies primarily focus on time-domain analysis, aiming to eliminate echo by modeling the long-range correlations of speech signals. However, these methods are limited in their ability to capture the dynamic variations in the frequency components of speech signals, thereby overlooking the significance of frequency-domain information. This paper proposes an energy distribution analysis method based on time-frequency (T-F) representation to address this issue. Introducing a dual-domain attention module (DDAM), which independently computes the local importance weights in both the frequency and time domains and multiplies these weights with the input features, accurately captures the most important time-frequency features of speech signals. In addition, the dual-domain feature enhancement block (DDFEB), which combines DDAM and convolutional layers, further enhances the multilevel representation of input features and integrates them into the encoder–decoder framework, effectively improving the representation of the time-frequency features. Experimental results show that the proposed method improves the perceptual evaluation of speech quality (PESQ) by 17.65% compared to the existing F-T-LSTM method and achieves a short-time objective intelligibility (STOI) score of 0.93. Furthermore, the proposed method increases the mean opinion score (MOS) by 0.33 compared to the existing DTLN-aec method, demonstrating its superiority in enhancing the user experience.",

keywords = "Acoustic echo cancellation, Dual-domain feature enhancement, Energy distribution, Speech quality assessment, Time-frequency dual-domain attention",

author = "Yibo Huang and Weidong Qin and Zhiyong Li and Qiuyu Zhang",

note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2025.",

year = "2025",

month = apr,

doi = "10.1007/s11227-025-07200-2",

language = "英语",

volume = "81",

journal = "Journal of Supercomputing",

issn = "0920-8542",

publisher = "Springer Netherlands",

number = "5",

}

TY - JOUR

T1 - Time-frequency dual-domain attention for acoustic echo cancellation

AU - Huang, Yibo

AU - Qin, Weidong

AU - Li, Zhiyong

AU - Zhang, Qiuyu

N1 - Publisher Copyright: © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2025.

PY - 2025/4

Y1 - 2025/4

N2 - Existing acoustic echo cancellation (AEC) technologies primarily focus on time-domain analysis, aiming to eliminate echo by modeling the long-range correlations of speech signals. However, these methods are limited in their ability to capture the dynamic variations in the frequency components of speech signals, thereby overlooking the significance of frequency-domain information. This paper proposes an energy distribution analysis method based on time-frequency (T-F) representation to address this issue. Introducing a dual-domain attention module (DDAM), which independently computes the local importance weights in both the frequency and time domains and multiplies these weights with the input features, accurately captures the most important time-frequency features of speech signals. In addition, the dual-domain feature enhancement block (DDFEB), which combines DDAM and convolutional layers, further enhances the multilevel representation of input features and integrates them into the encoder–decoder framework, effectively improving the representation of the time-frequency features. Experimental results show that the proposed method improves the perceptual evaluation of speech quality (PESQ) by 17.65% compared to the existing F-T-LSTM method and achieves a short-time objective intelligibility (STOI) score of 0.93. Furthermore, the proposed method increases the mean opinion score (MOS) by 0.33 compared to the existing DTLN-aec method, demonstrating its superiority in enhancing the user experience.

AB - Existing acoustic echo cancellation (AEC) technologies primarily focus on time-domain analysis, aiming to eliminate echo by modeling the long-range correlations of speech signals. However, these methods are limited in their ability to capture the dynamic variations in the frequency components of speech signals, thereby overlooking the significance of frequency-domain information. This paper proposes an energy distribution analysis method based on time-frequency (T-F) representation to address this issue. Introducing a dual-domain attention module (DDAM), which independently computes the local importance weights in both the frequency and time domains and multiplies these weights with the input features, accurately captures the most important time-frequency features of speech signals. In addition, the dual-domain feature enhancement block (DDFEB), which combines DDAM and convolutional layers, further enhances the multilevel representation of input features and integrates them into the encoder–decoder framework, effectively improving the representation of the time-frequency features. Experimental results show that the proposed method improves the perceptual evaluation of speech quality (PESQ) by 17.65% compared to the existing F-T-LSTM method and achieves a short-time objective intelligibility (STOI) score of 0.93. Furthermore, the proposed method increases the mean opinion score (MOS) by 0.33 compared to the existing DTLN-aec method, demonstrating its superiority in enhancing the user experience.

KW - Acoustic echo cancellation

KW - Dual-domain feature enhancement

KW - Energy distribution

KW - Speech quality assessment

KW - Time-frequency dual-domain attention

UR - http://www.scopus.com/inward/record.url?scp=105002980931&partnerID=8YFLogxK

U2 - 10.1007/s11227-025-07200-2

DO - 10.1007/s11227-025-07200-2

M3 - 文章

AN - SCOPUS:105002980931

SN - 0920-8542

VL - 81

JO - Journal of Supercomputing

JF - Journal of Supercomputing

IS - 5

M1 - 739

ER -

Time-frequency dual-domain attention for acoustic echo cancellation

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this