面向舱室声学环境的深度时域语音增强网络

Lin Zhang; Haitao Wang; Shuang Yang; Xiangyang Zeng; Ke'an Chen

面向舱室声学环境的深度时域语音增强网络

Lin Zhang, Haitao Wang, Shuang Yang, Xiangyang Zeng, Ke'an Chen

航海学院

Northwestern Polytechnical University Xian

科研成果: 期刊稿件 › 文章 › 同行评审

2 引用（Scopus）

摘要

A deep time-domain speech enhancement network with combined parallel dilated convolution and group convolution is designed for the single-channel speech enhancement problem in cabin environment. The network is proposed based on the classical convolutional time-domain audio separation network. In the enhancement layer, the parallel cavity convolution operations are performed with different expansion factors to realize the processing of long-time signals to extract more low-frequency information contained in the signal envelope and suppress the time delay problem caused by noise reverberation. Meanwhile, the speech detail information is preserved and the extraction accuracy of speech and background noise harmonic information contained in the waveform can be increased. In addition, group convolution is used to reduce the expansion of network size caused by parallel convolution operation, so that the network can maintain a small network size and operation complexity while having good enhancement effect. The experiments based on multiple types of aircraft cabin noise show that the designed network module improves the objective metrics compared with the baseline network, and the comparison results with other existing common networks show that the method can obtain better subjective and objective speech enhancement evaluation indexes under the data conditions of cabin environment, and has lower distortion in the line spectrum and narrow band of high noise level.

投稿的翻译标题	Single-channel deep time-domain speech enhancement networks for cabin environments
源语言	繁体中文
页（从-至）	890-900
页数	11
期刊	Shengxue Xuebao/Acta Acustica
卷	48
期	4
出版状态	已出版 - 7月 2023

关键词

Cabin environment
Deep network
Group convolution
Parallel dilated convolution
Single channel speech enhancement

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{fde909d0b68e45d992a8a46f6911d4f3,

title = "面向舱室声学环境的深度时域语音增强网络",

abstract = "A deep time-domain speech enhancement network with combined parallel dilated convolution and group convolution is designed for the single-channel speech enhancement problem in cabin environment. The network is proposed based on the classical convolutional time-domain audio separation network. In the enhancement layer, the parallel cavity convolution operations are performed with different expansion factors to realize the processing of long-time signals to extract more low-frequency information contained in the signal envelope and suppress the time delay problem caused by noise reverberation. Meanwhile, the speech detail information is preserved and the extraction accuracy of speech and background noise harmonic information contained in the waveform can be increased. In addition, group convolution is used to reduce the expansion of network size caused by parallel convolution operation, so that the network can maintain a small network size and operation complexity while having good enhancement effect. The experiments based on multiple types of aircraft cabin noise show that the designed network module improves the objective metrics compared with the baseline network, and the comparison results with other existing common networks show that the method can obtain better subjective and objective speech enhancement evaluation indexes under the data conditions of cabin environment, and has lower distortion in the line spectrum and narrow band of high noise level.",

keywords = "Cabin environment, Deep network, Group convolution, Parallel dilated convolution, Single channel speech enhancement",

author = "Lin Zhang and Haitao Wang and Shuang Yang and Xiangyang Zeng and Ke'an Chen",

year = "2023",

month = jul,

language = "繁体中文",

volume = "48",

pages = "890--900",

journal = "Shengxue Xuebao/Acta Acustica",

issn = "0371-0025",

publisher = "Science Press ",

number = "4",

}

TY - JOUR

T1 - 面向舱室声学环境的深度时域语音增强网络

AU - Zhang, Lin

AU - Wang, Haitao

AU - Yang, Shuang

AU - Zeng, Xiangyang

AU - Chen, Ke'an

PY - 2023/7

Y1 - 2023/7

N2 - A deep time-domain speech enhancement network with combined parallel dilated convolution and group convolution is designed for the single-channel speech enhancement problem in cabin environment. The network is proposed based on the classical convolutional time-domain audio separation network. In the enhancement layer, the parallel cavity convolution operations are performed with different expansion factors to realize the processing of long-time signals to extract more low-frequency information contained in the signal envelope and suppress the time delay problem caused by noise reverberation. Meanwhile, the speech detail information is preserved and the extraction accuracy of speech and background noise harmonic information contained in the waveform can be increased. In addition, group convolution is used to reduce the expansion of network size caused by parallel convolution operation, so that the network can maintain a small network size and operation complexity while having good enhancement effect. The experiments based on multiple types of aircraft cabin noise show that the designed network module improves the objective metrics compared with the baseline network, and the comparison results with other existing common networks show that the method can obtain better subjective and objective speech enhancement evaluation indexes under the data conditions of cabin environment, and has lower distortion in the line spectrum and narrow band of high noise level.

AB - A deep time-domain speech enhancement network with combined parallel dilated convolution and group convolution is designed for the single-channel speech enhancement problem in cabin environment. The network is proposed based on the classical convolutional time-domain audio separation network. In the enhancement layer, the parallel cavity convolution operations are performed with different expansion factors to realize the processing of long-time signals to extract more low-frequency information contained in the signal envelope and suppress the time delay problem caused by noise reverberation. Meanwhile, the speech detail information is preserved and the extraction accuracy of speech and background noise harmonic information contained in the waveform can be increased. In addition, group convolution is used to reduce the expansion of network size caused by parallel convolution operation, so that the network can maintain a small network size and operation complexity while having good enhancement effect. The experiments based on multiple types of aircraft cabin noise show that the designed network module improves the objective metrics compared with the baseline network, and the comparison results with other existing common networks show that the method can obtain better subjective and objective speech enhancement evaluation indexes under the data conditions of cabin environment, and has lower distortion in the line spectrum and narrow band of high noise level.

KW - Cabin environment

KW - Deep network

KW - Group convolution

KW - Parallel dilated convolution

KW - Single channel speech enhancement

UR - http://www.scopus.com/inward/record.url?scp=85170229108&partnerID=8YFLogxK

M3 - 文章

AN - SCOPUS:85170229108

SN - 0371-0025

VL - 48

SP - 890

EP - 900

JO - Shengxue Xuebao/Acta Acustica

JF - Shengxue Xuebao/Acta Acustica

IS - 4

ER -

面向舱室声学环境的深度时域语音增强网络

摘要

关键词

其它文件与链接

指纹

引用此