Skip to main navigation Skip to search Skip to main content

MixBAS: A Transformer-Based End-to-End Mixed Mono to Binaural Audio Synthesis Method

  • Ningning Pan
  • , Yuanxin Guo
  • , Jilu Jin
  • , Zhongpu Chen
  • , Yu Zhao
  • , Jingdong Chen
  • , Jacob Benesty
  • Southwestern University of Finance and Economics
  • Northwestern Polytechnical University Xian
  • Institut national de la recherche scientifique

Research output: Contribution to journalArticlepeer-review

Abstract

Binaural audio is essential for delivering immersive spatial auditory experiences through headsets. However, due to the high cost and complexity of binaural recording, there has been growing research interest in binaural audio synthesis (BAS) from monaural inputs. In natural listening environments, humans typically perceive multiple concurrent sound sources, yet most existing BAS approaches render each source independently, relying on perfect source signal separation, a condition rarely achievable in practice and often leading to perceptual quality degradation. To address this limitation, this paper proposes MixBAS, a transformer based end-to-end multi-source mono-to-binaural synthesis framework that eliminates the need for explicit source separation. We design an asymmetric transformer that spatializes a mono mixture, which comprises both speech and non-speech components, into its binaural counterpart by incorporating a user-defined positional prompt for the non-speech source. When reproduced over headphones, the generated binaural audio enables listeners to perceive a high-quality speech signal along with a non-speech source rendered at a user-specified spatial location. Experimental results demonstrate that MixBAS significantly outperforms existing BAS baselines relying on source separation in both objective metrics and perceptual quality.

Original languageEnglish
Pages (from-to)1840-1852
Number of pages13
JournalIEEE Transactions on Audio, Speech and Language Processing
Volume34
DOIs
StatePublished - 2026

Keywords

  • Mono-to-binaural audio synthesis
  • asymmetric transformer
  • audio spatialization
  • binaural audio rendering
  • multi-source binaural audio synthesis

Fingerprint

Dive into the research topics of 'MixBAS: A Transformer-Based End-to-End Mixed Mono to Binaural Audio Synthesis Method'. Together they form a unique fingerprint.

Cite this