Large language models (LLMs) that power artificial intelligence chatbots like ChatGPT are extremely complex and their functioning isn’t fully understood. These LLMs are used in a variety of areas such as customer support, code generation and language translation. However, researchers from MIT and other institutions have made strides in understanding how these models retrieve stored knowledge.
Through their study, they discovered that these models often utilize a basic linear function to recover and decode facts. The same linear function is used to decode similar types of facts. Using their understanding of these linear functions, researchers have developed a method to estimate these functions to probe what the model knows about new topics, where that knowledge is kept, and if the model stores the correct information. This could potentially help in identifying and correcting false information within the model, thus reducing the chance of it giving incorrect responses.
Part of the LLM models, also known as transformer models, are neural networks that contain billions of interconnected nodes – similar to the human brain, that store and process data. As this model learns more, it stores various facts about a subject across several layers. When queried about the subject, the model has to decode the most relevant fact. Despite the complexity of these models, the researchers found that the decoding process of relationship data happens via a simple linear function.
The researchers then calculated functions for 47 different types of data relationship, like “capital city of a country” and “lead singer of a band”, and changed the subject of each function to see if it can correctly retrieve the object information. They found that the correct information was recovered more than 60% of the time, showing that some data in a transformer is encoded and retrieved this way.
The researchers further use these functions to ascertain what the model perceives to be true about different subjects. In an experiment, they started with “Bill Bradley was a…” and used decoding functions for “plays sports” and “attended university” to check if the model knows that Bradley was a basketball player who attended Princeton.
This technique was used to create “attribute lens,” a grid that visualizes where specific information about a particular relation is stored within the transformer’s layers. This visualization tool could help scientists correct stored data and prevent AI chatbots from giving false information. In the future, the researchers hope to understand what happens when facts are not stored linearly, and also consider expanding their studies to larger models.
This research was backed by Open Philanthropy, the Israeli Science Foundation and an Azrieli Foundation Early Career Faculty Fellowship. The findings will be presented at the International Conference on Learning Representations.