Towards Robust Overlapping Speech Detection: A Speaker-Aware Progressive Approach Using WavLM

  • Zhaokai Sun
  • , Li Zhang
  • , Qing Wang
  • , Pan Zhou
  • , Lei Xie

Research output: Contribution to journalConference articlepeer-review

Abstract

Overlapping Speech Detection (OSD) aims to identify regions where multiple speakers overlap in a conversation, a critical challenge in multi-party speech processing. This work proposes a speaker-aware progressive OSD model that leverages a progressive training strategy to enhance the correlation between subtasks such as voice activity detection (VAD) and overlap detection. To improve acoustic representation, we explore the effectiveness of state-of-the-art self-supervised learning (SSL) models, including WavLM and wav2vec 2.0, while incorporating a speaker attention module to enrich features with frame-level speaker information. Experimental results show that the proposed method achieves state-of-the-art performance, with an F1 score of 82.76% on the AMI test set, demonstrating its robustness and effectiveness in OSD.

Original languageEnglish
Pages (from-to)1653-1657
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
DOIs
StatePublished - 2025
Event26th Interspeech Conference 2025 - Rotterdam, Netherlands
Duration: 17 Aug 202521 Aug 2025

Keywords

  • multi-task learning
  • overlapped speech detection
  • speaker recognition

Fingerprint

Dive into the research topics of 'Towards Robust Overlapping Speech Detection: A Speaker-Aware Progressive Approach Using WavLM'. Together they form a unique fingerprint.

Cite this