In the world of machine learning, large language models (LLMs) are a significant area of study. Recently, model merging or the combination of multiple LLMs into a single framework has fascinated the researcher’s community because it doesn’t require any additional training. This reduces the cost of creating new models considerably, sparking an interest in model merging.
Though the prior advancements like ‘model soup approach’ and methods such as DARE and Neural Architecture Search (NAS) have made positive contributions, these frameworks have some limitations. For instance, NAS requires substantial computational power, and there seems to be a lack of exploration in the field.
In response to this, researchers at Sakana AI have developed a new methodology that uses evolutionary algorithms to enhance the merger of foundational models. They exploit their technique’s potential to navigate both parameter space and data flow space, thereby fostering cross-domain merges. They introduced a cohesive framework that integrates both spaces and optimizes the data inference path in the merged model.
Their development takes the form of a general method called Evolutionary Model Merge, which can automatically discover optimal combinations of diverse open-source models to create new ones. It also allows for cross-domain merging, which means it can identify unique ways to merge models from different areas to create a hybrid model with enhanced capabilities.
This method shows strong results, scoring 52.0 on the MGSM-JA benchmark and showed improved performance in DFS-merged models. In the case of a hybrid model that integrates both merging strategies, their method showed further improvements adding to its credibility.
Their approach also has high efficiency and surprising generalizability. A model with the 7B parameter developed surpassed the previous 70B parameter Japanese LLMs on the associated datasets.
Besides, it generated a culturally-aware Vision-Language Model (VLM) in Japanese. This model provided positive outcomes when tested on a domestically sourced dataset of Japanese image and description pairs, showing its capability at handling Japanese culture-specific content.
In conclusion, Sakana AI’s research explores a novel approach that enables the automatic discovery of robust model combinations from different domains using evolutionary algorithms. The model achieved state-of-the-art results across benchmarks and exhibits cultural responsiveness. By automating these processes, model development becomes more efficient and can lead to models with robust and unexpected capabilities.
For further details, the research paper, GitHub repository, and blog are readily available. All credit for this research goes to the team at Sakana AI.