Serial-Parallel Dual-Path Architecture for Speaking Style Recognition

  • Guojian Li
  • , Qijie Shao
  • , Zhixian Zhao
  • , Shuiyuan Wang
  • , Zhonghua Fu
  • , Lei Xie

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Speaking Style Recognition (SSR) identifies a speaker’s speaking style characteristics from speech. Existing style recognition approaches primarily rely on linguistic information, with limited integration of acoustic information, which restricts recognition accuracy improvements. The fusion of acoustic and linguistic modalities offers significant potential to enhance recognition performance. In this paper, we propose a novel serial-parallel dual-path architecture for SSR that leverages acoustic-linguistic bimodal information. The serial path follows the ASR+STYLE serial paradigm, reflecting a sequential temporal dependency, while the parallel path integrates our designed Acoustic-Linguistic Similarity Module (ALSM) to facilitate cross-modal interaction with temporal simultaneity. Compared to the existing SSR baseline—the OSUM model, our approach reduces parameter size by 88.4% and achieves a 30.3% improvement in SSR accuracy for eight styles on the test set.

Original languageEnglish
Title of host publicationMan-Machine Speech Communication - 20th National Conference, NCMMSC 2025, Proceedings
EditorsJia Jia, Zhiyong Wu, Lijian Gao, Gongping Huang, Ya Li
PublisherSpringer Science and Business Media Deutschland GmbH
Pages241-254
Number of pages14
ISBN (Print)9789819553815
DOIs
StatePublished - 2026
Event20th National Conference on Man-Machine Speech Communication, NCMMSC 2025 - Zhenjiang, China
Duration: 16 Oct 202519 Oct 2025

Publication series

NameCommunications in Computer and Information Science
Volume2662 CCIS
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

Conference20th National Conference on Man-Machine Speech Communication, NCMMSC 2025
Country/TerritoryChina
CityZhenjiang
Period16/10/2519/10/25

Keywords

  • ASR + STYLE
  • Acoustic-Linguistic Similarity
  • Cross-Modal
  • Serial-Parallel
  • Speaking Style Recognition

Fingerprint

Dive into the research topics of 'Serial-Parallel Dual-Path Architecture for Speaking Style Recognition'. Together they form a unique fingerprint.

Cite this