UFORMER: A UNET BASED DILATED COMPLEX & REAL DUAL-PATH CONFORMER NETWORK FOR SIMULTANEOUS SPEECH ENHANCEMENT AND DEREVERBERATION

Yihui Fu; Yun Liu; Jingdong Li; Dawei Luo; Shubo Lv; Yukai Jv; Lei Xie

doi:10.1109/ICASSP43922.2022.9746020

UFORMER: A UNET BASED DILATED COMPLEX & REAL DUAL-PATH CONFORMER NETWORK FOR SIMULTANEOUS SPEECH ENHANCEMENT AND DEREVERBERATION

Yihui Fu, Yun Liu, Jingdong Li, Dawei Luo, Shubo Lv, Yukai Jv, Lei Xie

计算机学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

51 引用（Scopus）

摘要

Complex spectrum and magnitude are considered as two major features of speech enhancement and dereverberation. Traditional approaches always treat these two features separately, ignoring their underlying relationship. In this paper, we propose Uformer, a Unet based dilated complex & real dual-path conformer network in both complex and magnitude domain for simultaneous speech enhancement and dereverberation. We exploit time attention (TA) and dilated convolution (DC) to leverage local and global contextual information and frequency attention (FA) to model dimensional information. These three sub-modules contained in the proposed dilated complex & real dual-path conformer module effectively improve the speech enhancement and dereverberation performance. Furthermore, hybrid encoder and decoder are adopted to simultaneously model the complex spectrum and magnitude and promote the information interaction between two domains. Encoder decoder attention is also applied to enhance the interaction between encoder and decoder. Our experimental results outperform all SOTA time and complex domain models objectively and subjectively. Specifically, Uformer reaches 3.6032 DNSMOS on the blind test set of Interspeech 2021 DNS Challenge, which outperforms all top-performed models. We also carry out ablation experiments to tease apart all proposed sub-modules that are most important.

源语言	英语
主期刊名	2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings
出版商	Institute of Electrical and Electronics Engineers Inc.
页	7417-7421
页数	5
ISBN（电子版）	9781665405409
DOI	https://doi.org/10.1109/ICASSP43922.2022.9746020
出版状态	已出版 - 2022
活动	2022 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022 - Hybrid, 新加坡期限: 22 5月 2022 → 27 5月 2022

出版系列

姓名	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
卷	2022-May
ISSN（印刷版）	1520-6149

会议

会议	2022 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022
国家/地区	新加坡
市	Hybrid
时期	22/05/22 → 27/05/22

访问文件

10.1109/ICASSP43922.2022.9746020

其它文件与链接

链接到 Scopus 的出版物

引用此

Fu, Y., Liu, Y., Li, J., Luo, D., Lv, S., Jv, Y., & Xie, L. (2022). UFORMER: A UNET BASED DILATED COMPLEX & REAL DUAL-PATH CONFORMER NETWORK FOR SIMULTANEOUS SPEECH ENHANCEMENT AND DEREVERBERATION. 在 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings (页码 7417-7421). (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; 卷 2022-May). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP43922.2022.9746020

Fu, Yihui ; Liu, Yun ; Li, Jingdong 等. / UFORMER : A UNET BASED DILATED COMPLEX & REAL DUAL-PATH CONFORMER NETWORK FOR SIMULTANEOUS SPEECH ENHANCEMENT AND DEREVERBERATION. 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2022. 页码 7417-7421 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

@inproceedings{c0d2d58e44f349dd9a4c8a8be16e57aa,

title = "UFORMER: A UNET BASED DILATED COMPLEX & REAL DUAL-PATH CONFORMER NETWORK FOR SIMULTANEOUS SPEECH ENHANCEMENT AND DEREVERBERATION",

abstract = "Complex spectrum and magnitude are considered as two major features of speech enhancement and dereverberation. Traditional approaches always treat these two features separately, ignoring their underlying relationship. In this paper, we propose Uformer, a Unet based dilated complex & real dual-path conformer network in both complex and magnitude domain for simultaneous speech enhancement and dereverberation. We exploit time attention (TA) and dilated convolution (DC) to leverage local and global contextual information and frequency attention (FA) to model dimensional information. These three sub-modules contained in the proposed dilated complex & real dual-path conformer module effectively improve the speech enhancement and dereverberation performance. Furthermore, hybrid encoder and decoder are adopted to simultaneously model the complex spectrum and magnitude and promote the information interaction between two domains. Encoder decoder attention is also applied to enhance the interaction between encoder and decoder. Our experimental results outperform all SOTA time and complex domain models objectively and subjectively. Specifically, Uformer reaches 3.6032 DNSMOS on the blind test set of Interspeech 2021 DNS Challenge, which outperforms all top-performed models. We also carry out ablation experiments to tease apart all proposed sub-modules that are most important.",

keywords = "Uformer, dilated complex dual-path conformer, encoder decoder attention, hybrid encoder and decoder, speech enhancement and dereverberation",

author = "Yihui Fu and Yun Liu and Jingdong Li and Dawei Luo and Shubo Lv and Yukai Jv and Lei Xie",

note = "Publisher Copyright: {\textcopyright} 2022 IEEE; 2022 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022 ; Conference date: 22-05-2022 Through 27-05-2022",

year = "2022",

doi = "10.1109/ICASSP43922.2022.9746020",

language = "英语",

series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "7417--7421",

booktitle = "2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings",

}

Fu, Y, Liu, Y, Li, J, Luo, D, Lv, S, Jv, Y & Xie, L 2022, UFORMER: A UNET BASED DILATED COMPLEX & REAL DUAL-PATH CONFORMER NETWORK FOR SIMULTANEOUS SPEECH ENHANCEMENT AND DEREVERBERATION. 在 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 卷 2022-May, Institute of Electrical and Electronics Engineers Inc., 页码 7417-7421, 2022 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Hybrid, 新加坡, 22/05/22. https://doi.org/10.1109/ICASSP43922.2022.9746020

UFORMER: A UNET BASED DILATED COMPLEX & REAL DUAL-PATH CONFORMER NETWORK FOR SIMULTANEOUS SPEECH ENHANCEMENT AND DEREVERBERATION. / Fu, Yihui; Liu, Yun; Li, Jingdong 等.
2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2022. 页码 7417-7421 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; 卷 2022-May).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - UFORMER

T2 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022

AU - Fu, Yihui

AU - Liu, Yun

AU - Li, Jingdong

AU - Luo, Dawei

AU - Lv, Shubo

AU - Jv, Yukai

AU - Xie, Lei

PY - 2022

Y1 - 2022

N2 - Complex spectrum and magnitude are considered as two major features of speech enhancement and dereverberation. Traditional approaches always treat these two features separately, ignoring their underlying relationship. In this paper, we propose Uformer, a Unet based dilated complex & real dual-path conformer network in both complex and magnitude domain for simultaneous speech enhancement and dereverberation. We exploit time attention (TA) and dilated convolution (DC) to leverage local and global contextual information and frequency attention (FA) to model dimensional information. These three sub-modules contained in the proposed dilated complex & real dual-path conformer module effectively improve the speech enhancement and dereverberation performance. Furthermore, hybrid encoder and decoder are adopted to simultaneously model the complex spectrum and magnitude and promote the information interaction between two domains. Encoder decoder attention is also applied to enhance the interaction between encoder and decoder. Our experimental results outperform all SOTA time and complex domain models objectively and subjectively. Specifically, Uformer reaches 3.6032 DNSMOS on the blind test set of Interspeech 2021 DNS Challenge, which outperforms all top-performed models. We also carry out ablation experiments to tease apart all proposed sub-modules that are most important.

AB - Complex spectrum and magnitude are considered as two major features of speech enhancement and dereverberation. Traditional approaches always treat these two features separately, ignoring their underlying relationship. In this paper, we propose Uformer, a Unet based dilated complex & real dual-path conformer network in both complex and magnitude domain for simultaneous speech enhancement and dereverberation. We exploit time attention (TA) and dilated convolution (DC) to leverage local and global contextual information and frequency attention (FA) to model dimensional information. These three sub-modules contained in the proposed dilated complex & real dual-path conformer module effectively improve the speech enhancement and dereverberation performance. Furthermore, hybrid encoder and decoder are adopted to simultaneously model the complex spectrum and magnitude and promote the information interaction between two domains. Encoder decoder attention is also applied to enhance the interaction between encoder and decoder. Our experimental results outperform all SOTA time and complex domain models objectively and subjectively. Specifically, Uformer reaches 3.6032 DNSMOS on the blind test set of Interspeech 2021 DNS Challenge, which outperforms all top-performed models. We also carry out ablation experiments to tease apart all proposed sub-modules that are most important.

KW - Uformer

KW - dilated complex dual-path conformer

KW - encoder decoder attention

KW - hybrid encoder and decoder

KW - speech enhancement and dereverberation

UR - http://www.scopus.com/inward/record.url?scp=85131254704&partnerID=8YFLogxK

U2 - 10.1109/ICASSP43922.2022.9746020

DO - 10.1109/ICASSP43922.2022.9746020

M3 - 会议稿件

AN - SCOPUS:85131254704

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

SP - 7417

EP - 7421

BT - 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 22 May 2022 through 27 May 2022

ER -

Fu Y, Liu Y, Li J, Luo D, Lv S, Jv Y 等. UFORMER: A UNET BASED DILATED COMPLEX & REAL DUAL-PATH CONFORMER NETWORK FOR SIMULTANEOUS SPEECH ENHANCEMENT AND DEREVERBERATION. 在 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2022. 页码 7417-7421. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). doi: 10.1109/ICASSP43922.2022.9746020

UFORMER: A UNET BASED DILATED COMPLEX & REAL DUAL-PATH CONFORMER NETWORK FOR SIMULTANEOUS SPEECH ENHANCEMENT AND DEREVERBERATION

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此