Skip to content Skip to footer

FBI-LLM (Fully BInarized Large Language Model): A structure for AI that uses successive distillation for the 1-bit weight binarization of LLMs, built from the ground up.

Transformer-based Large Language Models (LLMs) like ChatGPT and LLaMA are highly effective in tasks requiring specialized knowledge and complex reasoning. However, their massive computational and storage requirements present significant challenges in wider applications. One solution to this problem is quantization, a method that converts 32-bit parameters into smaller bit sizes, which greatly improves storage efficiency and computational speed. Nevertheless, extreme quantization or binarization, while maximizing efficiency, tends to reduce accuracy.

To address these issues, researchers from the Mohamed bin Zayed University of AI and Carnegie Mellon University have introduced the Fully Binarized Large Language Models (FBI-LLM). This new approach trains large-scale binary language models from the ground up while matching the performance of their full-precision equivalent. Using autoregressive distillation (AD) loss, the researchers manage to maintain equivalent model dimensions and training data while achieving competitive results in terms of perplexity and task-specific outcomes.

The authors argue that the FBI-LLM offers a significant advancement in the field by effectively binarizing transformer-based LLMs. By replacing all linear modules (except the causal head) with FBI-linear, alongside maintaining full precision at the embedding and layer norm module, the model retains vital semantic information and activation scaling. During the training process, the FBI-LLM uses a full-precision teacher model for guidance, which ensures stable training from random initializations.

To test their methodology, the researchers trained FBI-LLMs of varying sizes — 130M, 1.3B, and 7B — using the expansive Amber dataset. Across a range of tasks, including BoolQ, PIQA, and Winogrande, FBI-LLMs demonstrated competitive metrics, often surpassing comparable binarized and full-precision models.

The researchers view their FBI-LLM methodology as a significant step forward, given that it achieves a fine balance between model size and performance. However, they recognize several limitations. Binarization inevitably leads to some performance loss compared to full-precision models. Additionally, the distillation process adds computational overhead, and current hardware constraints impede direct speed improvements from binarized LLMs. Finally, ethical concerns — including biases, privacy risks, and potential misinformation — associated with pretrained LLMs remain even after binarization. Despite these challenges, the work signifies promising advancements in the efficient training and use of LLMs.

Leave a comment

0.0/5