The intersection of artificial intelligence (AI) and music has become an essential field of study, with Large Language Models (LLMs) playing a significant role in generating sequences. Skywork AI PTE. LTD. and Hong Kong University of Science and Technology have developed ChatMusician, a text-based LLM, to tackle the issue of understanding and generating music.
ChatMusician shows potential in addressing both difficulties in generating music using language modeling and gaps in publicly available music databases. The team gathered metadata from two million music recordings on YouTube, and took 500,000 of them out. They used GPT-4 to create summaries of these metadata records and generated music knowledge QA pairings using Self-instruct. They found that their model based on GPT-4 outperformed others in various music-generating tasks, according to empirical evaluations.
The researchers believe that the incorporation of math and code could further enhance the reasoning power of symbolic music, which is currently lacking in symbolic music datasets in the computational music community. To confirm their system’s effectiveness, they also created MusicTheoryBenchmark, a test based on college-level course materials and past exams, formatted into JSON and ABC string formats.
The team’s findings suggested that Large Language Models struggle with benchmarks like MusicTheoryBenchmark. This direct comparison indicates there’s still a vast and unexplored realm of musical understanding that needs further research, similar to code and mathematical reasoning. To promote further collaboration in the field, all components of the framework, including the benchmark, scripts, and the 4B-token music-language corpora MusicPile, have been made open-source.
However, the current version of ChatMusician primarily produces Irish-style music, given the large dataset derived from this genre. As a result of limited variety in handcrafted music instructions, the model faces challenges in supporting open-ended music-generating tasks and experiences hallucinations.
This work represents a significant advancement in the field of AI-generated music and a substantial move towards generating an open-source Large Language Model incorporating intrinsic musical abilities. With further study, researchers could overcome the shortcomings of current models and continue to explore the fascinating intersection of language, music, and artificial intelligence.