Emotion-based music retrieval provides a natural and humanized way to help people experience music. In this paper, we utilize the three-dimensional Resonance-Arousal-Valence emotion model to represent the emotions invoked by music, and the relationship between acoustic features and their emotional impact based on this model is established. In addition, we also consider the emotional tag features for music, and then represent acoustic features and emotional tag features jointly in a low dimensional embedding space for music emotion, while the joint emotion space is optimized by minimizing the joint loss of acoustic features and emotional tag features through dimension reduction. Finally we construct a unified framework for music retrieval in joint emotion space by the means of query-by-music or query-by-tag or together, and then we utilize our proposed ranking algorithm to return an optimized ranked list that has the highest emotional similarity. The experimental results show that the joint emotion space and unified framework can produce satisfying results for emotion-based music retrieval.