Researchers have been focusing on an effective method to leverage in-context learning in transformer-based models like GPT-3+. Despite their success in enhancing AI performance, the method’s functionality remains partially understood. In light of this, a team of researchers from the University of California, Los Angeles (UCLA) examined the factors affecting in-context learning. They found that having accurate examples is not necessarily effective, but aspects like the structure of the prompts, the model’s size, and the examples’ order significantly affected the outcomes.
The UCLA team conducted a series of binary classification tasks (BCTs) to assess current methods of in-context learning in transformers and large language models (LLMs). They looked at theoretical and practical understanding and ‘learning to learn in-context’ by working with a meta-training framework: MetaICL. This study treated in-context learning as a unique algorithm, using traditional machine learning tools to analyze decision boundaries in binary classification tasks. In doing so, they shed light on the performance and behavior of in-context learning.
The questions central to the experiment included: How do pre-trained LLMs perform on BCTs? How are decision boundaries influenced by different factors? And, how can the smoothness of decision boundaries be enhanced? By testing several LLMs, including open-source models such as Llama2-7B, Llama3-8B, Llama2-13B, Mistral-7B-v0.1, and sheared-Llama-1.3B, with n in-context examples of BCTs, researchers revealed different shapes of decision boundaries, including linear, circular, and moon-shaped.
The findings indicated that fine-tuning LLMs does not necessarily result in smoother decision boundaries. After studying the Llama3-8B model, researchers found that despite being finetuned on 128 in-context learning examples, the decision boundaries remained non-smooth. To improve the smoothness of decision boundaries, a pre-trained Llama model was fine-tuned on a set of 1000 binary classification tasks featuring different decision boundaries.
In conclusion, the UCLA researchers proposed a new approach to understanding in-context learning in LLMs by examining their decision boundaries in BCTs. Despite high test accuracy, they found that decision boundaries in LLMs are often non-smooth. Factors influencing the smoothness of these boundaries were identified, and fine-tuning and adaptive sampling methods proved beneficial in enhancing boundary smoothness. These results are expected to guide future research and optimization in the field.