TY - GEN
T1 - Learning music emotion primitives via supervised dynamic clustering
AU - Liu, Yan
AU - Zhang, Xiang
AU - Chen, Gong
AU - Zhang, Kejun
AU - Liu, Yang
N1 - Funding Information:
This work was supported in part by the National Natural Science Foundation of China under Grants 61503317 and 61373122.
PY - 2016/10/1
Y1 - 2016/10/1
N2 - This paper explores a fundamental problem in music emotion analysis, i.e., how to segment the music sequence into a set of basic emotive units, which are named as emotion primitives. Current works on music emotion analysis are mainly based on the fixedlength music segments, which often leads to the difficulty of accurate emotion recognition. Short music segment, such as an individual music frame, may fail to evoke emotion response. Long music segment, such as an entire song, may convey various emotions over time. Moreover, the minimum length of music segment varies depending on the types of the emotions. To address these problems, we propose a novel method dubbed supervised dynamic clustering (SDC) to automatically decompose the music sequence into meaningful segments with various lengths. First, the music sequence is represented by a set of music frames. Then, the music frames are clustered according to the valence-arousal values in the emotion space. The clustering results are used to initialize the music segmentation. After that, a dynamic programming scheme is employed to jointly optimize the subsequent segmentation and grouping in the music feature space. Experimental results on standard dataset show both the effectiveness and the rationality of the proposed method.
AB - This paper explores a fundamental problem in music emotion analysis, i.e., how to segment the music sequence into a set of basic emotive units, which are named as emotion primitives. Current works on music emotion analysis are mainly based on the fixedlength music segments, which often leads to the difficulty of accurate emotion recognition. Short music segment, such as an individual music frame, may fail to evoke emotion response. Long music segment, such as an entire song, may convey various emotions over time. Moreover, the minimum length of music segment varies depending on the types of the emotions. To address these problems, we propose a novel method dubbed supervised dynamic clustering (SDC) to automatically decompose the music sequence into meaningful segments with various lengths. First, the music sequence is represented by a set of music frames. Then, the music frames are clustered according to the valence-arousal values in the emotion space. The clustering results are used to initialize the music segmentation. After that, a dynamic programming scheme is employed to jointly optimize the subsequent segmentation and grouping in the music feature space. Experimental results on standard dataset show both the effectiveness and the rationality of the proposed method.
KW - Emotion primitives
KW - Music emotion analysis
KW - Supervised dynamic clustering
UR - http://www.scopus.com/inward/record.url?scp=84994592410&partnerID=8YFLogxK
U2 - 10.1145/2964284.2967215
DO - 10.1145/2964284.2967215
M3 - Conference proceeding
AN - SCOPUS:84994592410
T3 - MM 2016 - Proceedings of the 2016 ACM Multimedia Conference
SP - 222
EP - 226
BT - MM 2016 - Proceedings of the 2016 ACM Multimedia Conference
PB - Association for Computing Machinery (ACM)
T2 - 24th ACM Multimedia Conference, MM 2016
Y2 - 15 October 2016 through 19 October 2016
ER -