跳到主要导航 跳到搜索 跳到主要内容

Serial-Parallel Dual-Path Architecture for Speaking Style Recognition

  • Guojian Li
  • , Qijie Shao
  • , Zhixian Zhao
  • , Shuiyuan Wang
  • , Zhonghua Fu
  • , Lei Xie

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Speaking Style Recognition (SSR) identifies a speaker’s speaking style characteristics from speech. Existing style recognition approaches primarily rely on linguistic information, with limited integration of acoustic information, which restricts recognition accuracy improvements. The fusion of acoustic and linguistic modalities offers significant potential to enhance recognition performance. In this paper, we propose a novel serial-parallel dual-path architecture for SSR that leverages acoustic-linguistic bimodal information. The serial path follows the ASR+STYLE serial paradigm, reflecting a sequential temporal dependency, while the parallel path integrates our designed Acoustic-Linguistic Similarity Module (ALSM) to facilitate cross-modal interaction with temporal simultaneity. Compared to the existing SSR baseline—the OSUM model, our approach reduces parameter size by 88.4% and achieves a 30.3% improvement in SSR accuracy for eight styles on the test set.

源语言英语
主期刊名Man-Machine Speech Communication - 20th National Conference, NCMMSC 2025, Proceedings
编辑Jia Jia, Zhiyong Wu, Lijian Gao, Gongping Huang, Ya Li
出版商Springer Science and Business Media Deutschland GmbH
241-254
页数14
ISBN(印刷版)9789819553815
DOI
出版状态已出版 - 2026
活动20th National Conference on Man-Machine Speech Communication, NCMMSC 2025 - Zhenjiang, 中国
期限: 16 10月 202519 10月 2025

出版系列

姓名Communications in Computer and Information Science
2662 CCIS
ISSN(印刷版)1865-0929
ISSN(电子版)1865-0937

会议

会议20th National Conference on Man-Machine Speech Communication, NCMMSC 2025
国家/地区中国
Zhenjiang
时期16/10/2519/10/25

指纹

探究 'Serial-Parallel Dual-Path Architecture for Speaking Style Recognition' 的科研主题。它们共同构成独一无二的指纹。

引用此