Large language models (LLMs), such as those which power AI chatbots like ChatGPT, are highly complex. While these powerful tools are used in diverse applications like customer support, code generation, and language translation, they remain somewhat of a mystery to the scientists who work with them. To develop a deeper understanding of their inner workings, researchers from institutions including MIT set out to investigate the mechanisms LLMs use to retrieve stored knowledge.
Interestingly, the researchers found that these complex models often incorporate surprisingly simple linear functions to recover and decipher stored facts. These functions, which consist of equations with only two variables and no square or cubic variables, represent the straightforward relationship between two variables. Furthermore, the researchers discovered that the model applies the same decoding function for similar types of facts.
Even in situations where a model incorrectly responds to a prompt, the researchers found it often still contains the correct information. This technique could be employed in finding and correcting erroneous information within the model, potentially reducing the likelihood of an AI giving nonsensical or incorrect responses.
Despite these advanced models being intricate and nonlinear functions that are trained with copious amounts of data and thus very difficult to comprehend, they sometimes utilize very simple mechanisms. Evan Hernandez, an Electrical Engineering and Computer Science graduate student and co-lead author of a paper describing these findings, stated that their research provided an example of such an instance.
The study was a collaborative effort involving several MIT researchers, along with Arnab Sharma, a Computer Science graduate student at Northeastern University; Jacob Andreas, an MIT associate professor and member of the Computer Science and AI Lab; and David Bau, an Assistant Professor of Computer Science at Northeastern University. Their findings will be presented at the International Conference on Learning Representations.
In their study, the researchers found that even though LLMs are incredibly complex, they decode relational information using simple linear functions. Each function is specific to the type of fact being retrieved. For instance, the transformer would use different decoding functions when predicting the instrument someone plays versus the state they were born in.
The team developed a method to calculate these simple functions, studying 47 relations such as “capital city of a country” or “lead singer of a band” using this approach. They found that these functions retrieved correct information more than 60% of the time, indicating that some information in a transformer is encoded and retrieved using linear functions. They have also devised what they term an “attribute lens”, a visual tool, which aids in understanding where specific information related to a specific relationship is stored within a transformer’s layers.
Going forward, the team aims to better understand the cases in which facts are not stored linearly and plan to experiment with larger models. Additionally, they intend to further study the accuracy of linear decoding functions. Mor Geva Pipek, an assistant professor at Tel Aviv University, stated this research reveals a crucial understanding of how LLMs recall factual knowledge during inference.
This research was financially supported by Open Philanthropy, the Israeli Science Foundation, and an Azrieli Foundation Early Career Faculty Fellowship.