Music emotion retrieval based on acoustic features

James Jie Deng*, Clement H C LEUNG

*Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

Music emotion expresses inherent and high-level states of mind and spiritual quality. In this paper, a hierarchical framework is proposed, which consists of two layers: an external layer that represents preliminary and superficial emotions and an inherent layer that represents psychic and resonant emotions. Using these two layers, a Resonance-Arousal-Valence (RAV) emotion model has been constructed. Five feature sets, including intensity, timbre, rhythm, pitch and tonality, and harmony, are extracted to represent music emotions in the RAV model. In order to effectively represent emotions with extracted features, suitable weighting schemes are utilized to balance the different features. As each music clip may have rather complex emotions, a supervised multiclass label model is adopted to annotate emotions with emotion multinomial. Preliminary experimental results indicate that the proposed emotion model and retrieval approach is able to deliver good retrieval performance.

Original languageEnglish
Title of host publicationAdvances in Electric and Electronics
Pages169-177
Number of pages9
DOIs
Publication statusPublished - 2012
Event2012 2nd International Conference on Electric and Electronics, EEIC 2012 - Sanya, China
Duration: 21 Apr 201222 Apr 2012

Publication series

NameLecture Notes in Electrical Engineering
Volume155 LNEE
ISSN (Print)1876-1100
ISSN (Electronic)1876-1119

Conference

Conference2012 2nd International Conference on Electric and Electronics, EEIC 2012
Country/TerritoryChina
CitySanya
Period21/04/1222/04/12

Scopus Subject Areas

  • Industrial and Manufacturing Engineering

User-Defined Keywords

  • arousal
  • Music emotion
  • music emotion retrieval
  • resonance
  • valence

Fingerprint

Dive into the research topics of 'Music emotion retrieval based on acoustic features'. Together they form a unique fingerprint.

Cite this