Abstract
Audio classification is an essential task in multimedia content analysis, which is a prerequisite to a variety of tasks such as segmentation, indexing and retrieval. This paper describes our study on multi-class audio classification on broadcast news, a popular multimedia repository with rich audio types. Motivated by the tonal regulations of music, we propose two pitch-density-based features, namely average pitch-density (APD) and relative tonal power density (RTPD). We use an SVM binary tree (SVM-BT) to hierarchically classify an audio clip into five classes: pure speech, music, environment sound, speech with music and speech with environment sound. Since SVM is a binary classifier, we use the SVM-BT architecture to realize coarse-to-fine multi-class classification with high accuracy and efficiency. Experiments show that the proposed one-dimensional APD and RTPD features are able to achieve comparable accuracy with popular high-dimensional features in speech/music discrimination, and the SVM-BT approach demonstrates superior performance in multi-class audio classification. With the help of the pitch-density-based features, we can achieve a high average accuracy of 94.2% in the five-class audio classification task.
| Original language | English |
|---|---|
| Pages (from-to) | 101-112 |
| Number of pages | 12 |
| Journal | Multimedia Systems |
| Volume | 17 |
| Issue number | 2 |
| DOIs | |
| State | Published - Mar 2011 |
Keywords
- Audio classification
- Audio diarization
- Multimedia content analysis
- Support vector machines
Fingerprint
Dive into the research topics of 'Pitch-density-based features and an SVM binary tree approach for multi-class audio classification in broadcast news'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver