Music emotion expresses inherent and high-level states of mind and spiritual quality. In this paper, a hierarchical framework is proposed, which consists of two layers: an external layer that represents preliminary and superficial emotions and an inherent layer that represents psychic and resonant emotions. Using these two layers, a Resonance-Arousal-Valence (RAV) emotion model has been constructed. Five feature sets, including intensity, timbre, rhythm, pitch and tonality, and harmony, are extracted to represent music emotions in the RAV model. In order to effectively represent emotions with extracted features, suitable weighting schemes are utilized to balance the different features. As each music clip may have rather complex emotions, a supervised multiclass label model is adopted to annotate emotions with emotion multinomial. Preliminary experimental results indicate that the proposed emotion model and retrieval approach is able to deliver good retrieval performance.