MIT researchers have developed a technique for improving the accuracy of uncertainty estimates in machine-learning models. This is especially important in situations where these models are used for critical tasks such as diagnosing diseases from medical imaging or filtering job applications. The new method works more efficiently and is scalable enough to apply to large deep-learning models used in healthcare and other sensitive areas.
Machine-learning models are often designed to indicate their confidence level in a specific decision. For instance, a model might say it is 49% confident that a medical image shows a particular disease. However, for this to be reliable, the model must be correct 49% of the time. And if the model’s confidence is misplaced, the results can be misleading or even detrimental.
The team from MIT has devised a way to ensure these confidence levels are accurate. Their method utilizes the minimum description length principle (MDL), which reduces the need for assumptions required by other uncertainty quantification methods. The MDL principle can provide more accurate uncertainty calibrations.
The researchers have created a technique called IF-COMP, which makes the MDL principle fast enough to work with large-scale deep-learning models. In tests, IF-COMP outperformed other methods in terms of speed and accuracy, effectively providing uncertainty quantifications that accurately reflect a model’s true confidence.
A key aspect of IF-COMP is that it can also detect if the model has mislabeled certain data points or identify data points that are outliers. The researchers used a combination of influence functions and temperature-scaling to enable high-quality approximations of stochastic data complexity.
The developed technique is model-agnostic, meaning it can work with many types of machine-learning models, enabling it to be deployed in a wider range of real-world situations. This will allow practitioners to make better informed decisions.
Researchers noted the urgency of this work as machine-learning models are increasingly dealing with large amounts of unexamined data to solve human-related problems. The new technique, IF-COMP, is expected to be a valuable auditing tool, providing an efficient way to gauge the reliability of a model’s predictions.
Ultimately, this research highlights the fallibility of machine-learning systems, emphasizing the need for better calibrations and uncertainty estimation techniques. Looking to the future, the researchers plan to apply their approach to large language models and explore other potential use cases for the minimum description length principle.