Large language models (LLMs), exemplified by dense transformer models like GPT-2 and PaLM, have revolutionized natural language processing thanks to their vast number of parameters, leading to record levels of accuracy and essential roles in data management tasks. However, these models are incredibly large and power-intensive, overwhelming the capabilities of even the strongest Graphic Processing Units (GPUs) which max out at around 80GB of memory.
Accommodating these substantial models necessitates aggregating memory from multiple GPUs – 100 billion parameters would require 32 NVIDIA A100 GPUs for fine-tuning. This method, however, is prohibitively expensive, particularly for academic researchers with limited access to high-end GPU servers. To alleviate this challenge, researchers from Zhejiang University have proposed Fuyou, a low-cost training framework that allows efficient fine-tuning of models with up to 100 billion parameters on low-end servers with less powerful GPUs and limited CPU memory capacity.
Fuyou takes advantage of three key innovations: a synchronous out-of-core CPU optimizer that overlaps with backward propagation to maximize GPU use; a fully-pipelined activation swapping mechanism that enables fine-tuning for considerably larger models; and automatic activation swapping management, which determines the optimal number of swapping activations to reduce epoch time.
Fuyou performs exceptionally well, whether using cutting-edge A100-80GB or 4090 GPUs. It achieves 87 TFLOPS on 4090 and 172 TFLOPS on A100-80GB when fine-tuning a GPT-3 175B model, and outperforms ZeRO-Infinity by up to 3.47x in TFLOPS when fine-tuning a GPT-3 13B model. Fuyou’s cost-effectiveness was demonstrated by comparing its performance with Megatron-LM using tensor parallelism, achieving up to 1.7 times the cost-effectiveness.
In conclusion, Fuyou promises to be a significant advancement in the field of low-cost training frameworks, specifically tuned to work on low-end servers with less capable GPUs and limited CPU memory. It achieves outstanding results when fine-tuning huge models whilst outstripping competitors in terms of performance and cost-effectiveness. This breakthrough could bring powerful natural language processing tools within closer reach of researchers with limited budgets and access to less advanced tech. The credit for this pioneering research goes to the resourceful team at Zhejiang University.