Large language models (LLMs), such as those used in AI chatbots, are complex, and scientists are still trying to understand how they function. Researchers from MIT and other institutions conducted a study to understand how these models retrieve stored knowledge. They found that LLMs usually use a simple linear function to recover and decode information. This function applies to similar kinds of facts.
Linear functions are equations involving two variables without any exponents. The researchers identified linear functions for different facts, leveraging them to determine what the model knows about different subjects and the location of such knowledge within the model. A technique was developed to estimate these functions and it was discovered that even when the model answered incorrectly, it often had the correct information stored.
This approach could be used in the future to find and correct misunderstandings inside the model, decreasing the chance of incorrect or nonsensical responses. The paper detailing the findings was written by graduate students from MIT and Northeastern University, along with faculty members from the respective institutions. The research will be presented at the International Conference on Learning Representations.
The researchers explained that LLMs, also described as transformer models, are comparable to the human brain’s neural networks, containing billions of interconnected nodes that are grouped into many layers which encode and process data. The knowledge stored in transformers is usually represented as relations connecting subjects and objects, such as “Miles Davis plays the trumpet”.
The researchers carried out experiments and found that even though these models are complex, they decode relational information using a simple linear function. Each function is particular to the type of detail being retrieved. The team developed a method to estimate these functions, computing functions for 47 different relations.
Not all facts are linearly encoded, and for some facts, even when the model is aware of them and predicts text that is consistent, linear functions cannot be found. This suggests that the model has a more intricate method of information storage.
The team also used this method to perceive what a model believes is true about varying subjects. They utilised a probing technique to produce an “attribute lens,” a grid visualizing where exact information about a particular relation is stored within the model’s layers. This could help prevent AI chatbots from dispensing false information.
Looking forward, the researchers aim to understand scenarios where facts are not stored linearly. They also wish to experiment with larger models and study the precision of linear decoding functions. The work has been hailed as revealing a missing part of understanding how LLMs recall factual information during inference. The research was supported by Open Philanthropy, the Israeli Science Foundation, and an Azrieli Foundation Early Career Faculty Fellowship.