Skip to main navigation Skip to search Skip to main content

Pitch-density-based features and an SVM binary tree approach for multi-class audio classification in broadcast news

  • Lei Xie
  • , Zhong Hua Fu
  • , Wei Feng
  • , Yong Luo
  • Northwestern Polytechnical University Xian
  • City University of Hong Kong

Research output: Contribution to journalArticlepeer-review

33 Scopus citations

Abstract

Audio classification is an essential task in multimedia content analysis, which is a prerequisite to a variety of tasks such as segmentation, indexing and retrieval. This paper describes our study on multi-class audio classification on broadcast news, a popular multimedia repository with rich audio types. Motivated by the tonal regulations of music, we propose two pitch-density-based features, namely average pitch-density (APD) and relative tonal power density (RTPD). We use an SVM binary tree (SVM-BT) to hierarchically classify an audio clip into five classes: pure speech, music, environment sound, speech with music and speech with environment sound. Since SVM is a binary classifier, we use the SVM-BT architecture to realize coarse-to-fine multi-class classification with high accuracy and efficiency. Experiments show that the proposed one-dimensional APD and RTPD features are able to achieve comparable accuracy with popular high-dimensional features in speech/music discrimination, and the SVM-BT approach demonstrates superior performance in multi-class audio classification. With the help of the pitch-density-based features, we can achieve a high average accuracy of 94.2% in the five-class audio classification task.

Original languageEnglish
Pages (from-to)101-112
Number of pages12
JournalMultimedia Systems
Volume17
Issue number2
DOIs
StatePublished - Mar 2011

Keywords

  • Audio classification
  • Audio diarization
  • Multimedia content analysis
  • Support vector machines

Fingerprint

Dive into the research topics of 'Pitch-density-based features and an SVM binary tree approach for multi-class audio classification in broadcast news'. Together they form a unique fingerprint.

Cite this