Hybrid Dilated and Recursive Recurrent Convolution Network for Time-Domain Speech Enhancement

Zhendong Song; Yupeng Ma; Fang Tan; Xiaoyi Feng

doi:10.3390/app12073461

Hybrid Dilated and Recursive Recurrent Convolution Network for Time-Domain Speech Enhancement

Zhendong Song, Yupeng Ma, Fang Tan, Xiaoyi Feng

电子信息学院

科研成果: 期刊稿件 › 文章 › 同行评审

10 引用（Scopus）

摘要

In this paper, we propose a fully convolutional neural network based on recursive recurrent convolution for monaural speech enhancement in the time domain. The proposed network is an encoder-decoder structure using a series of hybrid dilated modules (HDM). The encoder creates low-dimensional features of a noisy input frame. In the HDM, the dilated convolution is used to expand the receptive field of the network model. In contrast, the standard convolution is used to make up for the under-utilized local information of the dilated convolution. The decoder is used to reconstruct enhanced frames. The recursive recurrent convolutional network uses GRU to solve the problem of multiple training parameters and complex structures. State-of-the-art results are achieved on two commonly used speech datasets.

源语言	英语
文章编号	3461
期刊	Applied Sciences (Switzerland)
卷	12
期	7
DOI	https://doi.org/10.3390/app12073461
出版状态	已出版 - 1 4月 2022

访问文件

10.3390/app12073461

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{9a9b2f63938f4e92b7d9b51a7dae958f,

title = "Hybrid Dilated and Recursive Recurrent Convolution Network for Time-Domain Speech Enhancement",

abstract = "In this paper, we propose a fully convolutional neural network based on recursive recurrent convolution for monaural speech enhancement in the time domain. The proposed network is an encoder-decoder structure using a series of hybrid dilated modules (HDM). The encoder creates low-dimensional features of a noisy input frame. In the HDM, the dilated convolution is used to expand the receptive field of the network model. In contrast, the standard convolution is used to make up for the under-utilized local information of the dilated convolution. The decoder is used to reconstruct enhanced frames. The recursive recurrent convolutional network uses GRU to solve the problem of multiple training parameters and complex structures. State-of-the-art results are achieved on two commonly used speech datasets.",

keywords = "hybrid dilated convolution, recurrent convolution, speech enhancement, time domain",

author = "Zhendong Song and Yupeng Ma and Fang Tan and Xiaoyi Feng",

note = "Publisher Copyright: {\textcopyright} 2022 by the authors. Licensee MDPI, Basel, Switzerland.",

year = "2022",

month = apr,

day = "1",

doi = "10.3390/app12073461",

language = "英语",

volume = "12",

journal = "Applied Sciences (Switzerland)",

issn = "2076-3417",

publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",

number = "7",

}

TY - JOUR

T1 - Hybrid Dilated and Recursive Recurrent Convolution Network for Time-Domain Speech Enhancement

AU - Song, Zhendong

AU - Ma, Yupeng

AU - Tan, Fang

AU - Feng, Xiaoyi

PY - 2022/4/1

Y1 - 2022/4/1

N2 - In this paper, we propose a fully convolutional neural network based on recursive recurrent convolution for monaural speech enhancement in the time domain. The proposed network is an encoder-decoder structure using a series of hybrid dilated modules (HDM). The encoder creates low-dimensional features of a noisy input frame. In the HDM, the dilated convolution is used to expand the receptive field of the network model. In contrast, the standard convolution is used to make up for the under-utilized local information of the dilated convolution. The decoder is used to reconstruct enhanced frames. The recursive recurrent convolutional network uses GRU to solve the problem of multiple training parameters and complex structures. State-of-the-art results are achieved on two commonly used speech datasets.

AB - In this paper, we propose a fully convolutional neural network based on recursive recurrent convolution for monaural speech enhancement in the time domain. The proposed network is an encoder-decoder structure using a series of hybrid dilated modules (HDM). The encoder creates low-dimensional features of a noisy input frame. In the HDM, the dilated convolution is used to expand the receptive field of the network model. In contrast, the standard convolution is used to make up for the under-utilized local information of the dilated convolution. The decoder is used to reconstruct enhanced frames. The recursive recurrent convolutional network uses GRU to solve the problem of multiple training parameters and complex structures. State-of-the-art results are achieved on two commonly used speech datasets.

KW - hybrid dilated convolution

KW - recurrent convolution

KW - speech enhancement

KW - time domain

UR - http://www.scopus.com/inward/record.url?scp=85128211756&partnerID=8YFLogxK

U2 - 10.3390/app12073461

DO - 10.3390/app12073461

M3 - 文章

AN - SCOPUS:85128211756

SN - 2076-3417

VL - 12

JO - Applied Sciences (Switzerland)

JF - Applied Sciences (Switzerland)

IS - 7

M1 - 3461

ER -

Hybrid Dilated and Recursive Recurrent Convolution Network for Time-Domain Speech Enhancement

摘要

访问文件

其它文件与链接

指纹

引用此