DC-TseNet: A dual-channel time-domain speech enhancement network

Yihui Fu; Sining Sun; Lei Xie

doi:10.1109/ICOT51877.2020.9468808

DC-TseNet: A dual-channel time-domain speech enhancement network

Yihui Fu, Sining Sun, Lei Xie

计算机学院

Northwestern Polytechnical University Xian

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

摘要

In this paper, we propose an end-to-end dual-channel time domain speech enhancement approach, named DC-TseNet, for devices with multiple microphones such as mobile phones used in far-filed scenario like teleconferencing. DC-TseNet incorporates a computationally efficient CNN to form a unified encoder-enhancement-decoder structure that learns clean speech directly using multichannel signals. In addition, DC-TseNet is trained from both intra-channel an inter-channel features to express the relevance and difference between the collected signals from the two microphones, which makes sufficient use of spatial information and reduce the influence of recording direction on the signals. The experimental results show that the proposed dual-channel time-domain approach, with more compact model size, significantly outperforms the LSTM-based frequency-domain method. Furthermore, we find that the inter-channel information, especially the difference between two channels, is more important for a better performance gain.

源语言	英语
主期刊名	2020 8th International Conference on Orange Technology, ICOT 2020
出版商	Institute of Electrical and Electronics Engineers Inc.
ISBN（电子版）	9781665418522
DOI	https://doi.org/10.1109/ICOT51877.2020.9468808
出版状态	已出版 - 18 12月 2020
活动	8th International Conference on Orange Technology, ICOT 2020 - Daegu, 韩国期限: 18 12月 2020 → 21 12月 2020

出版系列

姓名	2020 8th International Conference on Orange Technology, ICOT 2020

会议

会议	8th International Conference on Orange Technology, ICOT 2020
国家/地区	韩国
市	Daegu
时期	18/12/20 → 21/12/20

联合国可持续发展目标

此成果有助于实现下列可持续发展目标：

访问文件

10.1109/ICOT51877.2020.9468808

其它文件与链接

链接到 Scopus 的出版物

引用此

@inproceedings{266d2e39778946e7a6ea279161c951c3,

title = "DC-TseNet: A dual-channel time-domain speech enhancement network",

abstract = "In this paper, we propose an end-to-end dual-channel time domain speech enhancement approach, named DC-TseNet, for devices with multiple microphones such as mobile phones used in far-filed scenario like teleconferencing. DC-TseNet incorporates a computationally efficient CNN to form a unified encoder-enhancement-decoder structure that learns clean speech directly using multichannel signals. In addition, DC-TseNet is trained from both intra-channel an inter-channel features to express the relevance and difference between the collected signals from the two microphones, which makes sufficient use of spatial information and reduce the influence of recording direction on the signals. The experimental results show that the proposed dual-channel time-domain approach, with more compact model size, significantly outperforms the LSTM-based frequency-domain method. Furthermore, we find that the inter-channel information, especially the difference between two channels, is more important for a better performance gain.",

keywords = "CNN, DC-TseNet, Dual-channel, Time-domain speech enhancement",

author = "Yihui Fu and Sining Sun and Lei Xie",

note = "Publisher Copyright: {\textcopyright} 2020 IEEE.; 8th International Conference on Orange Technology, ICOT 2020 ; Conference date: 18-12-2020 Through 21-12-2020",

year = "2020",

month = dec,

day = "18",

doi = "10.1109/ICOT51877.2020.9468808",

language = "英语",

series = "2020 8th International Conference on Orange Technology, ICOT 2020",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

booktitle = "2020 8th International Conference on Orange Technology, ICOT 2020",

}

Fu, Y, Sun, S & Xie, L 2020, DC-TseNet: A dual-channel time-domain speech enhancement network. 在 2020 8th International Conference on Orange Technology, ICOT 2020., 9468808, 2020 8th International Conference on Orange Technology, ICOT 2020, Institute of Electrical and Electronics Engineers Inc., 8th International Conference on Orange Technology, ICOT 2020, Daegu, 韩国, 18/12/20. https://doi.org/10.1109/ICOT51877.2020.9468808

DC-TseNet: A dual-channel time-domain speech enhancement network. / Fu, Yihui; Sun, Sining; Xie, Lei.
2020 8th International Conference on Orange Technology, ICOT 2020. Institute of Electrical and Electronics Engineers Inc., 2020. 9468808 (2020 8th International Conference on Orange Technology, ICOT 2020).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - DC-TseNet

T2 - 8th International Conference on Orange Technology, ICOT 2020

AU - Fu, Yihui

AU - Sun, Sining

AU - Xie, Lei

PY - 2020/12/18

Y1 - 2020/12/18

N2 - In this paper, we propose an end-to-end dual-channel time domain speech enhancement approach, named DC-TseNet, for devices with multiple microphones such as mobile phones used in far-filed scenario like teleconferencing. DC-TseNet incorporates a computationally efficient CNN to form a unified encoder-enhancement-decoder structure that learns clean speech directly using multichannel signals. In addition, DC-TseNet is trained from both intra-channel an inter-channel features to express the relevance and difference between the collected signals from the two microphones, which makes sufficient use of spatial information and reduce the influence of recording direction on the signals. The experimental results show that the proposed dual-channel time-domain approach, with more compact model size, significantly outperforms the LSTM-based frequency-domain method. Furthermore, we find that the inter-channel information, especially the difference between two channels, is more important for a better performance gain.

AB - In this paper, we propose an end-to-end dual-channel time domain speech enhancement approach, named DC-TseNet, for devices with multiple microphones such as mobile phones used in far-filed scenario like teleconferencing. DC-TseNet incorporates a computationally efficient CNN to form a unified encoder-enhancement-decoder structure that learns clean speech directly using multichannel signals. In addition, DC-TseNet is trained from both intra-channel an inter-channel features to express the relevance and difference between the collected signals from the two microphones, which makes sufficient use of spatial information and reduce the influence of recording direction on the signals. The experimental results show that the proposed dual-channel time-domain approach, with more compact model size, significantly outperforms the LSTM-based frequency-domain method. Furthermore, we find that the inter-channel information, especially the difference between two channels, is more important for a better performance gain.

KW - CNN

KW - DC-TseNet

KW - Dual-channel

KW - Time-domain speech enhancement

UR - http://www.scopus.com/inward/record.url?scp=85112479478&partnerID=8YFLogxK

U2 - 10.1109/ICOT51877.2020.9468808

DO - 10.1109/ICOT51877.2020.9468808

M3 - 会议稿件

AN - SCOPUS:85112479478

T3 - 2020 8th International Conference on Orange Technology, ICOT 2020

BT - 2020 8th International Conference on Orange Technology, ICOT 2020

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 18 December 2020 through 21 December 2020

ER -

DC-TseNet: A dual-channel time-domain speech enhancement network

摘要

出版系列

会议

联合国可持续发展目标

访问文件

其它文件与链接

指纹

引用此