Large Language Models (LLMs) represent a significant advancement across several application domains, delivering remarkable results in a variety of tasks. Despite these benefits, the massive size of LLMs renders substantial computational costs, making them challenging to adapt to specific downstream tasks, particularly on hardware systems with limited computational capabilities.
With billions of parameters, these models require significant computational resources for operation. Prior research has shown that LLMs demonstrate significant generalization abilities, allowing them to apply what they have learned to new tasks that they have not encountered during training, a phenomena referred to as zero-shot learning. But despite this, fine-tuning is still essential to optimize the performance of LLMs on robust user data sets and tasks.
A common fine-tuning strategy involves adjusting a fraction of LLM parameters while leaving the remainder the same, a technique known as Parameter-Efficient Fine-Tuning (PEFT). Beyond Natural Language Processing (NLP), PEFT’s applicability extends to fields such as computer vision (CV), Vision Transformers (ViT), diffusion models, and interdisciplinary vision-language models.
Researchers from Northeastern University, the University of California, Arizona State University, and New York University have put together a survey to examine PEFT algorithms and evaluate their performance and computational requirements. The study also provides an overview of applications developed using PEFT methods and explores strategies to reduce the computational costs associated with PEFT. The survey takes a closer look at real-world system designs to explore the implementation costs of different PEFT algorithms offering researchers valuable insights.
PEFT algorithms are categorized into additive, selective, reparameterized, and hybrid fine-tuning, depending on their operations. For instance, Selective fine-tuning only involves selecting a small subset of parameters from the base model, and only these parameters are tunable during task fine-tuning. Reparameterization involves transforming model parameters between two equivalent forms, hence introducing additional trainable parameters.
One strategy to accelerate inference in LLMs involves storing previous keys and values in a Key-Value cache. This practice eliminates the need to recalculate them for each new token.
In conclusion, this survey provides an extensive exploration of diverse PEFT algorithms, offering insights into their performance, applications, implementation costs, and categorizing the methods. It serves as integral guidance for researchers juggling with the intricate nature of fine-tuning large models. All credit goes to the researchers involved in this project.
You can check out the entire paper here and be sure to stay connected with us on Twitter, Telegram Channel, Discord Channel, and LinkedIn Group. Don’t forget to join our ML SubReddit, and if you like what we do, be sure to sign up for our newsletter.