Skip to main navigation Skip to search Skip to main content

Cross-Speaker Emotion Transfer Through Information Perturbation in Emotional Speech Synthesis

  • Yi Lei
  • , Shan Yang
  • , Xinfa Zhu
  • , Lei Xie
  • , Dan Su
  • Northwestern Polytechnical University Xian

Research output: Contribution to journalArticlepeer-review

18 Scopus citations

Abstract

Through borrowing emotional expressions from an emotional speaker, cross-speaker emotion transfer is an effective way to produce emotional speech for target speakers without emotional training data. Since emotion and timbre of the source speaker are heavily entangled in speech, existing approaches often struggle to trade off between speaker similarity and emotional expression in the synthetic speech of the target speaker. In this letter, we propose to disentangle timbre and emotion through information perturbation to conduct cross-speaker emotion transfer, which effectively learns the emotional expression of the source speaker and maintains the timbre of the target speaker. Specifically, we separately perturb the timbre and emotion-related features (e.g., formant and pitch) of source speech to obtain and model the timbre- and emotion-independent signals, based on which the proposed model can deliver the emotional expression for target speakers. Experimental results demonstrate the proposed approach significantly outperforms the baselines in terms of naturalness and similarity, indicating the effectiveness of information perturbation for cross-speaker emotion transfer.

Original languageEnglish
Pages (from-to)1948-1952
Number of pages5
JournalIEEE Signal Processing Letters
Volume29
DOIs
StatePublished - 2022

Keywords

  • Cross-speaker emotion transfer
  • emotional TTS
  • information perturbation
  • speech synthesis

Fingerprint

Dive into the research topics of 'Cross-Speaker Emotion Transfer Through Information Perturbation in Emotional Speech Synthesis'. Together they form a unique fingerprint.

Cite this