A Pitch-aware Approach to Single-channel Speech Separation

Ke Wang; Frank Soong; Lei Xie

doi:10.1109/ICASSP.2019.8683138

A Pitch-aware Approach to Single-channel Speech Separation

Ke Wang, Frank Soong, Lei Xie

School of Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

22 Scopus citations

Abstract

Despite significant advancements of deep learning on separating speech sources mixed in a single channel, same gender speaker mix, i.e., male-male or female-female, is still more difficult to separate than the case of opposite gender mix. In this study, we propose a pitch-aware speech separation approach to improve the speech separation performance. The proposed approach performs speech separation in the following steps: 1) training a pre-separation model to separate the mixed sources; 2) training a pitch-tracking network to perform polyphonic pitch tracking; 3) incorporating the estimated pitch for the final pitch-aware speech separation. Experimental results of the new approach, tested on the WSJ0-2mix public dataset, show that the new approach improves speech separation performance for both same and opposite gender mixture. The improved performance in signal-to-distortion (SDR) of 12.0 dB is the best reported result without using any phase enhancement.

Original language	English
Title of host publication	2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	296-300
Number of pages	5
ISBN (Electronic)	9781479981311
DOIs	https://doi.org/10.1109/ICASSP.2019.8683138
State	Published - May 2019
Event	44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Brighton, United Kingdom Duration: 12 May 2019 → 17 May 2019

Publication series

Name	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume	2019-May
ISSN (Print)	1520-6149

Conference

Conference	44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019
Country/Territory	United Kingdom
City	Brighton
Period	12/05/19 → 17/05/19

Keywords

deep clustering
permutation invariant training
pitch tracking
speech separation

Access to Document

10.1109/ICASSP.2019.8683138

Cite this

Wang, K., Soong, F., & Xie, L. (2019). A Pitch-aware Approach to Single-channel Speech Separation. In 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings (pp. 296-300). Article 8683138 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2019-May). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2019.8683138

Wang, Ke ; Soong, Frank ; Xie, Lei. / A Pitch-aware Approach to Single-channel Speech Separation. 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2019. pp. 296-300 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

@inproceedings{f17d801dfeed491eb0c2b309a5f48bad,

title = "A Pitch-aware Approach to Single-channel Speech Separation",

abstract = "Despite significant advancements of deep learning on separating speech sources mixed in a single channel, same gender speaker mix, i.e., male-male or female-female, is still more difficult to separate than the case of opposite gender mix. In this study, we propose a pitch-aware speech separation approach to improve the speech separation performance. The proposed approach performs speech separation in the following steps: 1) training a pre-separation model to separate the mixed sources; 2) training a pitch-tracking network to perform polyphonic pitch tracking; 3) incorporating the estimated pitch for the final pitch-aware speech separation. Experimental results of the new approach, tested on the WSJ0-2mix public dataset, show that the new approach improves speech separation performance for both same and opposite gender mixture. The improved performance in signal-to-distortion (SDR) of 12.0 dB is the best reported result without using any phase enhancement.",

keywords = "deep clustering, permutation invariant training, pitch tracking, speech separation",

author = "Ke Wang and Frank Soong and Lei Xie",

note = "Publisher Copyright: {\textcopyright} 2019 IEEE.; 44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 ; Conference date: 12-05-2019 Through 17-05-2019",

year = "2019",

month = may,

doi = "10.1109/ICASSP.2019.8683138",

language = "英语",

series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "296--300",

booktitle = "2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings",

}

Wang, K, Soong, F & Xie, L 2019, A Pitch-aware Approach to Single-channel Speech Separation. in 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings., 8683138, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 2019-May, Institute of Electrical and Electronics Engineers Inc., pp. 296-300, 44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019, Brighton, United Kingdom, 12/05/19. https://doi.org/10.1109/ICASSP.2019.8683138

A Pitch-aware Approach to Single-channel Speech Separation. / Wang, Ke; Soong, Frank; Xie, Lei.
2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2019. p. 296-300 8683138 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2019-May).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - A Pitch-aware Approach to Single-channel Speech Separation

AU - Wang, Ke

AU - Soong, Frank

AU - Xie, Lei

PY - 2019/5

Y1 - 2019/5

N2 - Despite significant advancements of deep learning on separating speech sources mixed in a single channel, same gender speaker mix, i.e., male-male or female-female, is still more difficult to separate than the case of opposite gender mix. In this study, we propose a pitch-aware speech separation approach to improve the speech separation performance. The proposed approach performs speech separation in the following steps: 1) training a pre-separation model to separate the mixed sources; 2) training a pitch-tracking network to perform polyphonic pitch tracking; 3) incorporating the estimated pitch for the final pitch-aware speech separation. Experimental results of the new approach, tested on the WSJ0-2mix public dataset, show that the new approach improves speech separation performance for both same and opposite gender mixture. The improved performance in signal-to-distortion (SDR) of 12.0 dB is the best reported result without using any phase enhancement.

AB - Despite significant advancements of deep learning on separating speech sources mixed in a single channel, same gender speaker mix, i.e., male-male or female-female, is still more difficult to separate than the case of opposite gender mix. In this study, we propose a pitch-aware speech separation approach to improve the speech separation performance. The proposed approach performs speech separation in the following steps: 1) training a pre-separation model to separate the mixed sources; 2) training a pitch-tracking network to perform polyphonic pitch tracking; 3) incorporating the estimated pitch for the final pitch-aware speech separation. Experimental results of the new approach, tested on the WSJ0-2mix public dataset, show that the new approach improves speech separation performance for both same and opposite gender mixture. The improved performance in signal-to-distortion (SDR) of 12.0 dB is the best reported result without using any phase enhancement.

KW - deep clustering

KW - permutation invariant training

KW - pitch tracking

KW - speech separation

UR - http://www.scopus.com/inward/record.url?scp=85068959869&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2019.8683138

DO - 10.1109/ICASSP.2019.8683138

M3 - 会议稿件

AN - SCOPUS:85068959869

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

SP - 296

EP - 300

BT - 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019

Y2 - 12 May 2019 through 17 May 2019

ER -

Wang K, Soong F, Xie L. A Pitch-aware Approach to Single-channel Speech Separation. In 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2019. p. 296-300. 8683138. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). doi: 10.1109/ICASSP.2019.8683138

A Pitch-aware Approach to Single-channel Speech Separation

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this