Artificial Intelligence (AI) is a rapidly advancing field that often requires hefty investments, predominantly accessible to tech giants like OpenAI and Meta. However, an exciting breakthrough presents an exception to this norm—turning the tide in favor of democratizing AI development. Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and Myshell AI have demonstrated that large language models (LLMs), equivalent to LLaMA2-level, can be trained economically.
Their research unveils a super-efficient model named JetMoE-8B. This model not only lowers the cost barrier traditionally attached to LLMs training but also outshines the performance of pricier models such as LLaMA2-7B from Meta AI. Consequently, it becomes possible to train high-performing LLMs outside of well-endowed organizations, opening doors for numerous research institutions and enterprises.
JetMoE-8B is engineered to be open-source and academic-friendly, promoting an inclusive approach towards AI training. With no reliance on proprietary resources, this model uses public datasets and open-sourced code for training. Also, its architecture allows fine-tuning on affordable consumer-grade GPUs. These features make JetMoE-8B an ideal choice for institutions operating with limited budgets, thereby lowering the entry barriers for high-quality AI research and application.
Crafted with a sparsely activated structure inspired by ModuleFormer, JetMoE-8B includes 24 blocks. Each has two types of Mixture of Experts (MoE) layers. With a total of 8 billion parameters, only 2.2 billion are active during inference, decreasing computational costs without compromising performance. Impressively, JetMoE-8B outperforms rival models like LLaMA2-7B and LLaMA-13B with more considerable training outlays and computing power.
JetMoE-8B’s cost-effectiveness is noteworthy, as its training cost is just around $0.08 million. This was achieved using a 96×H100 GPU cluster for two weeks following a two-phased training methodology—a combination of a constant learning rate with linear warmup alongside an exponential learning rate decay. This was implemented over a training corpus of 1.25 trillion tokens from open-source datasets.
In summary, JetMoE-8B disrupts the prevailing notion that superior LLM training requires enormous monetary investments by accomplishing the task with merely $0.1 million. Its unique offerings include being open-source, low computational requirements during fine-tuning phase and outperforming rival models despite having smaller budgets. These factors contribute to JetMoE-8B’s appeal as a democratized, high-performing LLM, potentially fueling more inclusive, widespread AI research and development. The introduction of JetMoE-8B signifies a significant democratization move in the AI world, possibly setting off a wave of innovation with a diverse array of contributors like never witnessed before.