Learning optimal features for music transcription

Huaiping Ming, Dongyan Huang, Lei Xie, Haizhou Li

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This paper aims to design time-frequency representation (TFR) functions for automatic music transcription. It is desirable that the decomposition of those TFR functions are suitable for notes having variation of both pitch and spectral envelop over time. The Harmonic Adaptive Latent Component Analysis (HALCA) model adopted in this paper allows considering those two kinds of variations simultaneously. We evaluate the influence of three TFR functions including IIR, FIR filter bank semigram (FBSG) and constant-Q transform semigram in automatic music transcription task, on a database of popular and polyphonic classic music. The experiment results show that the filter bank based representations are suitable for multiple-instrument recordings and a CQT-based representation turns out to provide very accurate transcription for solo-instrument recordings.

Original languageEnglish
Title of host publication2014 IEEE China Summit and International Conference on Signal and Information Processing, IEEE ChinaSIP 2014 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages105-109
Number of pages5
ISBN (Electronic)9781479954032
DOIs
StatePublished - 3 Sep 2014
Event2nd IEEE China Summit and International Conference on Signal and Information Processing, IEEE ChinaSIP 2014 - Xi'an, China
Duration: 9 Jul 201413 Jul 2014

Publication series

Name2014 IEEE China Summit and International Conference on Signal and Information Processing, IEEE ChinaSIP 2014 - Proceedings

Conference

Conference2nd IEEE China Summit and International Conference on Signal and Information Processing, IEEE ChinaSIP 2014
Country/TerritoryChina
CityXi'an
Period9/07/1413/07/14

Keywords

  • constant-Q transform
  • filter bank
  • logarithmic compression
  • music transcription
  • Semigram features

Fingerprint

Dive into the research topics of 'Learning optimal features for music transcription'. Together they form a unique fingerprint.

Cite this