Foundation models, or large-scale deep-learning models, are becoming increasingly prevalent, particularly in powering prominent AI services such as DALL-E, or ChatGPT. These models are trained on huge quantities of general-purpose, unlabeled data, which is then repurposed for various uses, such as image generation or customer service tasks. However, the complex nature of these AI tools means that inaccuracies or misleading results can arise. In sectors where safety is paramount, such as autonomous driving, this could lead to dangerous results.
Probe into this issue led researchers at both MIT and the MIT-IBM Watson AI Lab to develop a method to estimate a model’s reliability before applying it to a specific task. They achieved this via training a set of foundation models that varied slightly from each other, enabling their algorithm to evaluate the level of consistency in the interpretations each separate model developed from identical testing data. If the interpretations proved consistent, it signaled a reliable model.
When compared to established benchmark techniques, the new method better captured the reliability of these large-scale pretrained models across a range of classification tasks, such as labeling images or text. The approach also removes the requirement for testing on actual-world data sets, allowing the model’s potential application to be assessed without violating privacy concerns. Additionally, it enables users to rank AI models based on their reliability scores, which helps to choose the one best suited to the given task.
The researchers used the so-called ‘neighborhood consistency’ concept to navigate the comparison of abstract presentations. The team prepared reliable reference points to test within the ensemble of models, and observed the reference points near that model’s test point replication to gauge reliability.
A limitation of the current method is that the training ensemble of foundation models is computationally expensive, leading the team to explore more efficient ways to create multiple models in future research. The work was part-funded by the MIT-IBM Watson AI Lab, MathWorks, and Amazon.
Framing the information in practical terms, if an AI’s primary function was identifying whether an image was a cat or a dog, a problem might occur if a model trained on general data was subsequently used to classify animal species it had never encountered before. Foundation models differ from traditional machine learning models that provide concrete outputs, such as labeling an image ‘dog’ or ‘cat’. Instead, foundation models generate abstract inputs from a data point’s representation.
In the future, this work could have significant implications, particularly when dealing with individual-focused data inputs, such as a specific patient’s characteristics within the fields of health and wellness.