TY - GEN
T1 - DC-TseNet
T2 - 8th International Conference on Orange Technology, ICOT 2020
AU - Fu, Yihui
AU - Sun, Sining
AU - Xie, Lei
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/12/18
Y1 - 2020/12/18
N2 - In this paper, we propose an end-to-end dual-channel time domain speech enhancement approach, named DC-TseNet, for devices with multiple microphones such as mobile phones used in far-filed scenario like teleconferencing. DC-TseNet incorporates a computationally efficient CNN to form a unified encoder-enhancement-decoder structure that learns clean speech directly using multichannel signals. In addition, DC-TseNet is trained from both intra-channel an inter-channel features to express the relevance and difference between the collected signals from the two microphones, which makes sufficient use of spatial information and reduce the influence of recording direction on the signals. The experimental results show that the proposed dual-channel time-domain approach, with more compact model size, significantly outperforms the LSTM-based frequency-domain method. Furthermore, we find that the inter-channel information, especially the difference between two channels, is more important for a better performance gain.
AB - In this paper, we propose an end-to-end dual-channel time domain speech enhancement approach, named DC-TseNet, for devices with multiple microphones such as mobile phones used in far-filed scenario like teleconferencing. DC-TseNet incorporates a computationally efficient CNN to form a unified encoder-enhancement-decoder structure that learns clean speech directly using multichannel signals. In addition, DC-TseNet is trained from both intra-channel an inter-channel features to express the relevance and difference between the collected signals from the two microphones, which makes sufficient use of spatial information and reduce the influence of recording direction on the signals. The experimental results show that the proposed dual-channel time-domain approach, with more compact model size, significantly outperforms the LSTM-based frequency-domain method. Furthermore, we find that the inter-channel information, especially the difference between two channels, is more important for a better performance gain.
KW - CNN
KW - DC-TseNet
KW - Dual-channel
KW - Time-domain speech enhancement
UR - http://www.scopus.com/inward/record.url?scp=85112479478&partnerID=8YFLogxK
U2 - 10.1109/ICOT51877.2020.9468808
DO - 10.1109/ICOT51877.2020.9468808
M3 - 会议稿件
AN - SCOPUS:85112479478
T3 - 2020 8th International Conference on Orange Technology, ICOT 2020
BT - 2020 8th International Conference on Orange Technology, ICOT 2020
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 18 December 2020 through 21 December 2020
ER -