Hybrid Dilated and Recursive Recurrent Convolution Network for Time-Domain Speech Enhancement

Zhendong Song; Yupeng Ma; Fang Tan; Xiaoyi Feng

doi:10.3390/app12073461

Hybrid Dilated and Recursive Recurrent Convolution Network for Time-Domain Speech Enhancement

Zhendong Song, Yupeng Ma, Fang Tan, Xiaoyi Feng

School of Electronics and Information

Research output: Contribution to journal › Article › peer-review

10 Scopus citations

Abstract

In this paper, we propose a fully convolutional neural network based on recursive recurrent convolution for monaural speech enhancement in the time domain. The proposed network is an encoder-decoder structure using a series of hybrid dilated modules (HDM). The encoder creates low-dimensional features of a noisy input frame. In the HDM, the dilated convolution is used to expand the receptive field of the network model. In contrast, the standard convolution is used to make up for the under-utilized local information of the dilated convolution. The decoder is used to reconstruct enhanced frames. The recursive recurrent convolutional network uses GRU to solve the problem of multiple training parameters and complex structures. State-of-the-art results are achieved on two commonly used speech datasets.

Original language	English
Article number	3461
Journal	Applied Sciences (Switzerland)
Volume	12
Issue number	7
DOIs	https://doi.org/10.3390/app12073461
State	Published - 1 Apr 2022

Keywords

hybrid dilated convolution
recurrent convolution
speech enhancement
time domain

Access to Document

10.3390/app12073461

Cite this

@article{9a9b2f63938f4e92b7d9b51a7dae958f,

title = "Hybrid Dilated and Recursive Recurrent Convolution Network for Time-Domain Speech Enhancement",

abstract = "In this paper, we propose a fully convolutional neural network based on recursive recurrent convolution for monaural speech enhancement in the time domain. The proposed network is an encoder-decoder structure using a series of hybrid dilated modules (HDM). The encoder creates low-dimensional features of a noisy input frame. In the HDM, the dilated convolution is used to expand the receptive field of the network model. In contrast, the standard convolution is used to make up for the under-utilized local information of the dilated convolution. The decoder is used to reconstruct enhanced frames. The recursive recurrent convolutional network uses GRU to solve the problem of multiple training parameters and complex structures. State-of-the-art results are achieved on two commonly used speech datasets.",

keywords = "hybrid dilated convolution, recurrent convolution, speech enhancement, time domain",

author = "Zhendong Song and Yupeng Ma and Fang Tan and Xiaoyi Feng",

note = "Publisher Copyright: {\textcopyright} 2022 by the authors. Licensee MDPI, Basel, Switzerland.",

year = "2022",

month = apr,

day = "1",

doi = "10.3390/app12073461",

language = "英语",

volume = "12",

journal = "Applied Sciences (Switzerland)",

issn = "2076-3417",

publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",

number = "7",

}

TY - JOUR

T1 - Hybrid Dilated and Recursive Recurrent Convolution Network for Time-Domain Speech Enhancement

AU - Song, Zhendong

AU - Ma, Yupeng

AU - Tan, Fang

AU - Feng, Xiaoyi

PY - 2022/4/1

Y1 - 2022/4/1

N2 - In this paper, we propose a fully convolutional neural network based on recursive recurrent convolution for monaural speech enhancement in the time domain. The proposed network is an encoder-decoder structure using a series of hybrid dilated modules (HDM). The encoder creates low-dimensional features of a noisy input frame. In the HDM, the dilated convolution is used to expand the receptive field of the network model. In contrast, the standard convolution is used to make up for the under-utilized local information of the dilated convolution. The decoder is used to reconstruct enhanced frames. The recursive recurrent convolutional network uses GRU to solve the problem of multiple training parameters and complex structures. State-of-the-art results are achieved on two commonly used speech datasets.

AB - In this paper, we propose a fully convolutional neural network based on recursive recurrent convolution for monaural speech enhancement in the time domain. The proposed network is an encoder-decoder structure using a series of hybrid dilated modules (HDM). The encoder creates low-dimensional features of a noisy input frame. In the HDM, the dilated convolution is used to expand the receptive field of the network model. In contrast, the standard convolution is used to make up for the under-utilized local information of the dilated convolution. The decoder is used to reconstruct enhanced frames. The recursive recurrent convolutional network uses GRU to solve the problem of multiple training parameters and complex structures. State-of-the-art results are achieved on two commonly used speech datasets.

KW - hybrid dilated convolution

KW - recurrent convolution

KW - speech enhancement

KW - time domain

UR - http://www.scopus.com/inward/record.url?scp=85128211756&partnerID=8YFLogxK

U2 - 10.3390/app12073461

DO - 10.3390/app12073461

M3 - 文章

AN - SCOPUS:85128211756

SN - 2076-3417

VL - 12

JO - Applied Sciences (Switzerland)

JF - Applied Sciences (Switzerland)

IS - 7

M1 - 3461

ER -

Hybrid Dilated and Recursive Recurrent Convolution Network for Time-Domain Speech Enhancement

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this