Skip to content Skip to footer
Search
Search
Search

The Collaborative AI Research Work by Sun Yat-sen University and Tencent AI Lab Presents FUSELLM: A Breakthrough in Integrating Various Large Language Models to Improve Functionality

Large language models (LLMs) like GPT and LLaMA have revolutionized natural language processing tasks. However, creating these models from scratch is a resource and energy-intensive process. Therefore, there is a growing interest in more cost-effective alternatives, and one such innovative approach is the fusion of pre-trained LLMs into a more efficient model, leveraging their collective strengths.

Fusing LLMs is challenging due to the diversity in their architectures. The concept of knowledge fusion aims to combine these models to create a more potent one, maximizing their strengths and minimizing costs. However, conventional methods such as ensemble strategies and weight merging face practical challenges and seldom yield optimal results when applied to models with significant parameter spaces difference.

Sun Yat-sen University and Tencent AI Lab researchers proposed a new concept of knowledge fusion for LLMs that leverages the generative distributions of source LLMs. This method transfers their knowledge to a target LLM through lightweight continual training, focusing on aligning and fusing the probabilistic distributions produced by the source LLMs. It also involves aligning tokenizations across different LLMs for effective knowledge fusion and evaluating the quality of different LLMs to assign varying importance levels to their respective distribution matrices.

The performance of this new method, called FuseLLM, was tested using three distinct open-source LLMs: Llama-2, MPT, and OpenLLaMA. The fused model remarkably outperformed each source LLM in various tasks including reasoning, commonsense, and code generation, underscoring FuseLLM’s effectiveness in enhancing the capabilities of individual LLMs.

This research highlights some key points: FuseLLM is an effective LLM fusion method, superior to conventional techniques. The fused model exhibits enhanced capabilities. The approach opens up new avenues for developing potent and efficient LLMs by leveraging existing models.

In summary, the study of knowledge fusion in LLMs paves the way for innovative language model development. By combining the capabilities of diverse LLMs, this method offers a solution to resource-intensive model training challenges. The results demonstrate the FuseLLM approach’s effectiveness, opening up potential future advancements in natural language processing. The researchers of this project receive all credit. The full paper and Github documentation are available for further exploration.

Leave a comment

0.0/5