The Conversational Short-phrase Speaker Diarization (CSSD) Task: Dataset, Evaluation Metric and Baselines

Gaofeng Cheng; Yifan Chen; Runyan Yang; Qingxuan Li; Zehui Yang; Lingxuan Ye; Pengyuan Zhang; Qingqing Zhang; Lei Xie; Yanmin Qian; Kong Aik Lee; Yonghong Yan

doi:10.1109/ISCSLP57327.2022.10038258

The Conversational Short-phrase Speaker Diarization (CSSD) Task: Dataset, Evaluation Metric and Baselines

Gaofeng Cheng, Yifan Chen, Runyan Yang, Qingxuan Li, Zehui Yang, Lingxuan Ye, Pengyuan Zhang, Qingqing Zhang, Lei Xie, Yanmin Qian, Kong Aik Lee, Yonghong Yan

School of Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

5 Scopus citations

Abstract

The conversation scenario is one of the most important and most challenging scenarios for speech processing technologies because people in conversation respond to each other in a casual style. Detecting the speech activities of each person in a conversation is vital to downstream tasks, like natural language processing, machine translation, etc. People refer to the detection technology of 'who speak when' as speaker diarization (SD). Traditionally, diarization error rate (DER) has been used as the standard evaluation metric of SD systems for a long time. However, DER fails to give enough importance to short conversational phrases, which are short but important on the semantic level. Also, a carefully and accurately manually-annotated testing dataset suitable for evaluating the conversational SD technologies is still unavailable in the speech community. In this paper, we design and describe the Conversational Short-phrases Speaker Diarization (CSSD) task, which consists of training and testing datasets, evaluation metric and baselines. In the dataset aspect, despite the previously open-sourced 180-hour conversational MagicData-RAMC dataset, we prepare an individual 20-hour conversational speech test dataset with carefully and artificially verified speakers timestamps annotations for the CSSD task. In the metric aspect, we design the new conversational DER (CDER) evaluation metric, which calculates the SD accuracy at the utterance level. In the baseline aspect, we adopt a commonly used method: Variational Bayes HMM x-vector system, as the baseline of the CSSD task. Our evaluation metric is publicly available at https://github.com/SpeechClub/CDER_Metric.

Original language	English
Title of host publication	2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022
Editors	Kong Aik Lee, Hung-yi Lee, Yanfeng Lu, Minghui Dong
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	488-492
Number of pages	5
ISBN (Electronic)	9798350397963
DOIs	https://doi.org/10.1109/ISCSLP57327.2022.10038258
State	Published - 2022
Event	13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022 - Singapore, Singapore Duration: 11 Dec 2022 → 14 Dec 2022

Publication series

Name	2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022

Conference

Conference	13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022
Country/Territory	Singapore
City	Singapore
Period	11/12/22 → 14/12/22

Keywords

conversational speech
short-phrase
speaker diarization

Access to Document

10.1109/ISCSLP57327.2022.10038258

Cite this

Cheng, G., Chen, Y., Yang, R., Li, Q., Yang, Z., Ye, L., Zhang, P., Zhang, Q., Xie, L., Qian, Y., Lee, K. A., & Yan, Y. (2022). The Conversational Short-phrase Speaker Diarization (CSSD) Task: Dataset, Evaluation Metric and Baselines. In K. A. Lee, H. Lee, Y. Lu, & M. Dong (Eds.), 2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022 (pp. 488-492). (2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ISCSLP57327.2022.10038258

Cheng, Gaofeng ; Chen, Yifan ; Yang, Runyan et al. / The Conversational Short-phrase Speaker Diarization (CSSD) Task : Dataset, Evaluation Metric and Baselines. 2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022. editor / Kong Aik Lee ; Hung-yi Lee ; Yanfeng Lu ; Minghui Dong. Institute of Electrical and Electronics Engineers Inc., 2022. pp. 488-492 (2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022).

@inproceedings{597eff94cb354b1db88cddc16b460775,

title = "The Conversational Short-phrase Speaker Diarization (CSSD) Task: Dataset, Evaluation Metric and Baselines",

abstract = "The conversation scenario is one of the most important and most challenging scenarios for speech processing technologies because people in conversation respond to each other in a casual style. Detecting the speech activities of each person in a conversation is vital to downstream tasks, like natural language processing, machine translation, etc. People refer to the detection technology of 'who speak when' as speaker diarization (SD). Traditionally, diarization error rate (DER) has been used as the standard evaluation metric of SD systems for a long time. However, DER fails to give enough importance to short conversational phrases, which are short but important on the semantic level. Also, a carefully and accurately manually-annotated testing dataset suitable for evaluating the conversational SD technologies is still unavailable in the speech community. In this paper, we design and describe the Conversational Short-phrases Speaker Diarization (CSSD) task, which consists of training and testing datasets, evaluation metric and baselines. In the dataset aspect, despite the previously open-sourced 180-hour conversational MagicData-RAMC dataset, we prepare an individual 20-hour conversational speech test dataset with carefully and artificially verified speakers timestamps annotations for the CSSD task. In the metric aspect, we design the new conversational DER (CDER) evaluation metric, which calculates the SD accuracy at the utterance level. In the baseline aspect, we adopt a commonly used method: Variational Bayes HMM x-vector system, as the baseline of the CSSD task. Our evaluation metric is publicly available at https://github.com/SpeechClub/CDER_Metric.",

keywords = "conversational speech, short-phrase, speaker diarization",

author = "Gaofeng Cheng and Yifan Chen and Runyan Yang and Qingxuan Li and Zehui Yang and Lingxuan Ye and Pengyuan Zhang and Qingqing Zhang and Lei Xie and Yanmin Qian and Lee, {Kong Aik} and Yonghong Yan",

note = "Publisher Copyright: {\textcopyright} 2022 IEEE.; 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022 ; Conference date: 11-12-2022 Through 14-12-2022",

year = "2022",

doi = "10.1109/ISCSLP57327.2022.10038258",

language = "英语",

series = "2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "488--492",

editor = "Lee, {Kong Aik} and Hung-yi Lee and Yanfeng Lu and Minghui Dong",

booktitle = "2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022",

}

Cheng, G, Chen, Y, Yang, R, Li, Q, Yang, Z, Ye, L, Zhang, P, Zhang, Q, Xie, L, Qian, Y, Lee, KA & Yan, Y 2022, The Conversational Short-phrase Speaker Diarization (CSSD) Task: Dataset, Evaluation Metric and Baselines. in KA Lee, H Lee, Y Lu & M Dong (eds), 2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022. 2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022, Institute of Electrical and Electronics Engineers Inc., pp. 488-492, 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022, Singapore, Singapore, 11/12/22. https://doi.org/10.1109/ISCSLP57327.2022.10038258

The Conversational Short-phrase Speaker Diarization (CSSD) Task: Dataset, Evaluation Metric and Baselines. / Cheng, Gaofeng; Chen, Yifan; Yang, Runyan et al.
2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022. ed. / Kong Aik Lee; Hung-yi Lee; Yanfeng Lu; Minghui Dong. Institute of Electrical and Electronics Engineers Inc., 2022. p. 488-492 (2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - The Conversational Short-phrase Speaker Diarization (CSSD) Task

T2 - 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022

AU - Cheng, Gaofeng

AU - Chen, Yifan

AU - Yang, Runyan

AU - Li, Qingxuan

AU - Yang, Zehui

AU - Ye, Lingxuan

AU - Zhang, Pengyuan

AU - Zhang, Qingqing

AU - Xie, Lei

AU - Qian, Yanmin

AU - Lee, Kong Aik

AU - Yan, Yonghong

PY - 2022

Y1 - 2022

N2 - The conversation scenario is one of the most important and most challenging scenarios for speech processing technologies because people in conversation respond to each other in a casual style. Detecting the speech activities of each person in a conversation is vital to downstream tasks, like natural language processing, machine translation, etc. People refer to the detection technology of 'who speak when' as speaker diarization (SD). Traditionally, diarization error rate (DER) has been used as the standard evaluation metric of SD systems for a long time. However, DER fails to give enough importance to short conversational phrases, which are short but important on the semantic level. Also, a carefully and accurately manually-annotated testing dataset suitable for evaluating the conversational SD technologies is still unavailable in the speech community. In this paper, we design and describe the Conversational Short-phrases Speaker Diarization (CSSD) task, which consists of training and testing datasets, evaluation metric and baselines. In the dataset aspect, despite the previously open-sourced 180-hour conversational MagicData-RAMC dataset, we prepare an individual 20-hour conversational speech test dataset with carefully and artificially verified speakers timestamps annotations for the CSSD task. In the metric aspect, we design the new conversational DER (CDER) evaluation metric, which calculates the SD accuracy at the utterance level. In the baseline aspect, we adopt a commonly used method: Variational Bayes HMM x-vector system, as the baseline of the CSSD task. Our evaluation metric is publicly available at https://github.com/SpeechClub/CDER_Metric.

AB - The conversation scenario is one of the most important and most challenging scenarios for speech processing technologies because people in conversation respond to each other in a casual style. Detecting the speech activities of each person in a conversation is vital to downstream tasks, like natural language processing, machine translation, etc. People refer to the detection technology of 'who speak when' as speaker diarization (SD). Traditionally, diarization error rate (DER) has been used as the standard evaluation metric of SD systems for a long time. However, DER fails to give enough importance to short conversational phrases, which are short but important on the semantic level. Also, a carefully and accurately manually-annotated testing dataset suitable for evaluating the conversational SD technologies is still unavailable in the speech community. In this paper, we design and describe the Conversational Short-phrases Speaker Diarization (CSSD) task, which consists of training and testing datasets, evaluation metric and baselines. In the dataset aspect, despite the previously open-sourced 180-hour conversational MagicData-RAMC dataset, we prepare an individual 20-hour conversational speech test dataset with carefully and artificially verified speakers timestamps annotations for the CSSD task. In the metric aspect, we design the new conversational DER (CDER) evaluation metric, which calculates the SD accuracy at the utterance level. In the baseline aspect, we adopt a commonly used method: Variational Bayes HMM x-vector system, as the baseline of the CSSD task. Our evaluation metric is publicly available at https://github.com/SpeechClub/CDER_Metric.

KW - conversational speech

KW - short-phrase

KW - speaker diarization

UR - http://www.scopus.com/inward/record.url?scp=85148582516&partnerID=8YFLogxK

U2 - 10.1109/ISCSLP57327.2022.10038258

DO - 10.1109/ISCSLP57327.2022.10038258

M3 - 会议稿件

AN - SCOPUS:85148582516

T3 - 2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022

SP - 488

EP - 492

BT - 2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022

A2 - Lee, Kong Aik

A2 - Lee, Hung-yi

A2 - Lu, Yanfeng

A2 - Dong, Minghui

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 11 December 2022 through 14 December 2022

ER -

Cheng G, Chen Y, Yang R, Li Q, Yang Z, Ye L et al. The Conversational Short-phrase Speaker Diarization (CSSD) Task: Dataset, Evaluation Metric and Baselines. In Lee KA, Lee H, Lu Y, Dong M, editors, 2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022. Institute of Electrical and Electronics Engineers Inc. 2022. p. 488-492. (2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022). doi: 10.1109/ISCSLP57327.2022.10038258

The Conversational Short-phrase Speaker Diarization (CSSD) Task: Dataset, Evaluation Metric and Baselines

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this