This paper explores a fundamental problem in music emotion analysis, i.e., how to segment the music sequence into a set of basic emotive units, which are named as emotion primitives. Current works on music emotion analysis are mainly based on the fixedlength music segments, which often leads to the difficulty of accurate emotion recognition. Short music segment, such as an individual music frame, may fail to evoke emotion response. Long music segment, such as an entire song, may convey various emotions over time. Moreover, the minimum length of music segment varies depending on the types of the emotions. To address these problems, we propose a novel method dubbed supervised dynamic clustering (SDC) to automatically decompose the music sequence into meaningful segments with various lengths. First, the music sequence is represented by a set of music frames. Then, the music frames are clustered according to the valence-arousal values in the emotion space. The clustering results are used to initialize the music segmentation. After that, a dynamic programming scheme is employed to jointly optimize the subsequent segmentation and grouping in the music feature space. Experimental results on standard dataset show both the effectiveness and the rationality of the proposed method.