TY - GEN
T1 - The Conversational Short-phrase Speaker Diarization (CSSD) Task
T2 - 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022
AU - Cheng, Gaofeng
AU - Chen, Yifan
AU - Yang, Runyan
AU - Li, Qingxuan
AU - Yang, Zehui
AU - Ye, Lingxuan
AU - Zhang, Pengyuan
AU - Zhang, Qingqing
AU - Xie, Lei
AU - Qian, Yanmin
AU - Lee, Kong Aik
AU - Yan, Yonghong
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - The conversation scenario is one of the most important and most challenging scenarios for speech processing technologies because people in conversation respond to each other in a casual style. Detecting the speech activities of each person in a conversation is vital to downstream tasks, like natural language processing, machine translation, etc. People refer to the detection technology of 'who speak when' as speaker diarization (SD). Traditionally, diarization error rate (DER) has been used as the standard evaluation metric of SD systems for a long time. However, DER fails to give enough importance to short conversational phrases, which are short but important on the semantic level. Also, a carefully and accurately manually-annotated testing dataset suitable for evaluating the conversational SD technologies is still unavailable in the speech community. In this paper, we design and describe the Conversational Short-phrases Speaker Diarization (CSSD) task, which consists of training and testing datasets, evaluation metric and baselines. In the dataset aspect, despite the previously open-sourced 180-hour conversational MagicData-RAMC dataset, we prepare an individual 20-hour conversational speech test dataset with carefully and artificially verified speakers timestamps annotations for the CSSD task. In the metric aspect, we design the new conversational DER (CDER) evaluation metric, which calculates the SD accuracy at the utterance level. In the baseline aspect, we adopt a commonly used method: Variational Bayes HMM x-vector system, as the baseline of the CSSD task. Our evaluation metric is publicly available at https://github.com/SpeechClub/CDER_Metric.
AB - The conversation scenario is one of the most important and most challenging scenarios for speech processing technologies because people in conversation respond to each other in a casual style. Detecting the speech activities of each person in a conversation is vital to downstream tasks, like natural language processing, machine translation, etc. People refer to the detection technology of 'who speak when' as speaker diarization (SD). Traditionally, diarization error rate (DER) has been used as the standard evaluation metric of SD systems for a long time. However, DER fails to give enough importance to short conversational phrases, which are short but important on the semantic level. Also, a carefully and accurately manually-annotated testing dataset suitable for evaluating the conversational SD technologies is still unavailable in the speech community. In this paper, we design and describe the Conversational Short-phrases Speaker Diarization (CSSD) task, which consists of training and testing datasets, evaluation metric and baselines. In the dataset aspect, despite the previously open-sourced 180-hour conversational MagicData-RAMC dataset, we prepare an individual 20-hour conversational speech test dataset with carefully and artificially verified speakers timestamps annotations for the CSSD task. In the metric aspect, we design the new conversational DER (CDER) evaluation metric, which calculates the SD accuracy at the utterance level. In the baseline aspect, we adopt a commonly used method: Variational Bayes HMM x-vector system, as the baseline of the CSSD task. Our evaluation metric is publicly available at https://github.com/SpeechClub/CDER_Metric.
KW - conversational speech
KW - short-phrase
KW - speaker diarization
UR - http://www.scopus.com/inward/record.url?scp=85148582516&partnerID=8YFLogxK
U2 - 10.1109/ISCSLP57327.2022.10038258
DO - 10.1109/ISCSLP57327.2022.10038258
M3 - 会议稿件
AN - SCOPUS:85148582516
T3 - 2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022
SP - 488
EP - 492
BT - 2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022
A2 - Lee, Kong Aik
A2 - Lee, Hung-yi
A2 - Lu, Yanfeng
A2 - Dong, Minghui
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 11 December 2022 through 14 December 2022
ER -