The landscape for open-source Large Language Models (LLMs) has expanded rapidly, especially after Meta’s launches of the Llama3 model and its successor, Llama 2, in 2023. Notable open-source LLMs include Mixtral-8x7B by Mistral, Alibaba Cloud’s Qwen1.5 series, Smaug by Abacus AI, and Yi models from 01.AI, which focus on data quality.
LLMs have transformed the Natural Language Processing (NLP) field, offering various advantages over traditional cloud-based methods. One innovative idea underlines the advantage of combining cloud and on-device AI – cloud-on-device collaboration. Such a combination allows AI systems to achieve better performance, scalability, and flexibility. Light, private tasks are handled by on-device models, while complex tasks are allocated to the cloud-based models.
Researchers from Nexa AI have introduced an approach called Octopus v4, which uses functional tokens to incorporate multiple open-source models suitable for different tasks. This model stands out in parameter understanding, selection, and query restructuring. Octopus v4 converts user queries efficiently to the most appropriate vertical model and enhances the query format for improved performance.
The system architecture of Octopus v4 is a complex graph where every node represents a language model. These nodes are spread across different devices, enabling multiple units to work and necessitating the internet for data transfer. Two types of nodes are used in this setup: worker nodes, which represent a separate language model and can use a serverless architecture like Kubernetes; and master nodes, which can employ a base model with the number of parameters under 10 billion.
Researchers compared the Octopus v4 system’s performance with other models using the MMLU benchmark. They used the Octopus v4 model with 3 billion parameters and another worker language model with up to 8 billion parameters. This evaluation aimed to demonstrate the effectiveness of the Octopus v4 system.
In summation, Nexa AI’s proposal of Octopus v4 represents a pioneering approach using functional tokens to incorporate several open-source models optimized for specific tasks. It displayed notable performance when compared with other models in the MMLU benchmark test. Future enhancement plans include incorporating multiple vertical-specific models and adding advanced Octopus v4 models with multiagent ability.