MIT researchers have developed an algorithm called FeatUp that enables computer vision algorithms to capture both high-level details and fine-grained minutiae of a scene simultaneously. Modern computer vision algorithms, like human beings, can only recall the broad details of a scene while the more nuanced specifics are often lost. To understand an image, they break it down into a grid of tiny squares, each of which is processed together to comprehend what’s happening in the image. Each square is made up of 16 to 32 pixels, making the resolution of these algorithms substantially smaller than the images they’re processing.
The FeatUp algorithm addresses the loss of pixel clarity that occurs in these processes. It can improve the resolution of any deep network without affecting processing speed or image quality. This makes it easier for researchers to interpret the algorithm’s predictions more accurately. For instance, if the algorithm is used to detect lung cancer, by using FeatUp, a far more detailed view of where a tumor might be can be achieved.
FeatUp enhances various tasks beyond just predictions, including object detection, semantic segmentation (assigning labels to pixels in an image concerning objects), and depth estimation. It achieves this by providing more accurate, high-resolution features, crucial for developing vision applications such as autonomous driving or medical imaging.
Interestingly, FeatUp unveils these fine-grained details by slightly adjusting images, for instance, by moving an image a few pixels left or right, and observing how an algorithm responds to these subtle modifications. This creates multiple, slightly varied, deep-feature maps which are then combined to form a single, high-revolution set of deep features. This method enhances an algorithm’s understanding of high-resolution details, significantly boosting any algorithm’s performance that uses it, contributing to its effectiveness on time-sensitive tasks.
MIT researchers hope for FeatUp’s wide adoption within the research community and beyond, serving as a crucial tool in deep learning. The technology not only enables models to perceive the world in greater detail, but it also enhances the efficiency of high-resolution processing.
The study, supported by the National Science Foundation Graduate Research Fellowship, the National Science Foundation, Office of the Director of National Intelligence, the U.S. Air Force Research Laboratory, and the U.S. Air Force Artificial Intelligence Accelerator, was presented at the International Conference on Learning Representations.