Expressive and Controllable AI Music Creation based on Audio-Oriented Representation Analysis on Large-scale Data

  • XUE, Wei (PI)

Project: Research project

Project Details


Music represents human creativity and is an essential carrier of culture. Using artificial intelligence (AI) to create music, namely, AI music creation (AMC), has become a popular and important topic because of its significant value on scientific research and society. Specifically, this topic directly addresses the problem of artificial creativity and has great potential for culture protection, media, entertainment, and the metaverse. AMC is conventionally tackled by splitting the process into composition, performance control, and audio synthesizing stages. Although many progresses have been achieved, the current methods still have limited capability to produce music works with expressive and emotive feelings, achieve harmonic organization considering timbres and sound effects in multiple tracks, and allow for flexible control of the output. This is mainly due to the fact that early stages including composition and performance control mainly work on symbolic music, which is inadequate to model complex effects such as emotion, sound effects, and timbres; Moreover, different stages are separately studied and optimized, which prevents early stages considering the overall performance to be perceived by a human.

It is known that humans appreciate music in audio form. Human composers use the perception system to interactively evaluate the generated pieces in terms of expressiveness and multi-track harmony. Therefore, actually, all information is gathered during each stage of music creation. In other words, the perception ability helps the creation process. A similar framework is needed for the machines to achieve pleasant AMC.

Since all factors including expressiveness, emotion, harmony, and timbre are compacted in the audio, the audio domain provides a suitable platform to perform both perception and creation. This project aims to develop a new audio-oriented framework for AMC, which can perform end-to-end joint optimization on all different factors of music. We take representation analysis as the core of the project, which indicates external characteristics of music, enables joint optimization, flexible control and recreation. Moreover, as most open-source music is in audio form, the audio-oriented AMC will benefit from large-scale learning. The research objectives of the project are: 1) collecting and releasing large-scale audio-oriented music database for AMC; 2) conducting large-scale disentangled representation learning to find out how to represent and analyse the complex external characteristics of music; 3) performing expressive and controllable multi-track AMC based on joint optimization of the representations; 4) enabling recreation on existing audio music by representation transformation, such that the music can be precisely edited.
Effective start/end date1/01/23 → …


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.