In a recent AI research paper, Google researchers have developed a new pre-trained scorer model, named Cappy, which has been designed to improve and surpass the capabilities of large multi-task language models (LLMs). This new development aims to tackle the primary issues related to LLMs. While they demonstrate remarkable performance and compatibility with numerous natural language processing tasks, their enormous size necessitates substantial computational resources. Consequently, training and inference processes might become expensive and inefficient, particularly when deploying them to downstream applications.
Presently, several multi-task LLMs like T0, FLAN, and OPT-IML are leveraged for extensive natural language processing tasks. These are trained under a unified instruction-following scheme. However, adjusting these models for downstream applications, especially complex ones, poses additional challenges due to high hardware requirements and limited access to high-end LLMS. To counter these, Cappy was introduced. It is a lightweight pre-trained scorer specifically created to bolster the performance and efficiency of multi-task LLMs. Cappy operates independently on classification tasks or as an auxiliary component for LLMs, improving their performance without needing extensive fine-tuning or access to LLM parameters.
Cappy employs a RoBERTa-based architecture that includes a linear layer on top for regression. It uses a diversified dataset collection from PromptSource for pretraining to cover an array of task types. To cater to the need for label diversity in the pretraining data, the researchers propose a data construction method involving ground truth pairs, incorrect responses, and data augmentation using existing multi-task LLMs. Consequently, an extensive and effective regression pretraining dataset is formed.
Cappy’s implementation includes a candidate selection process that generates a score for each potential response provided an instruction. It can function independently on classification tasks or as an auxiliary component for generation tasks, improving the decoding of present multi-task LLMs. Notably, it allows the efficient adaptation of multi-task LLMs to downstream tasks without needing fine-tuning or access to LLM parameters.
In summary, the paper introduces Cappy as a resolution to the challenge related to the efficient use of large language models for multi-tasking scenarios. It demonstrates superior parameter efficiency and performance across various tasks, illustrating its potential to facilitate the application of large language models in practical scenarios. Future work will focus on exploring ways to improve Cappy’s design and leverage it as a vital tool in language model research and applications. Credit for the research goes entirely to the project’s researchers.