Researchers from Massachusetts Institute of Technology (MIT) have developed the Texture Tiling Model (TTM), a technique intended to address issues faced when attempting to model human visual perception accurately within deep neural networks (DNNs), and particularly peripheral vision. This area of vision, which views the world with less fidelity further away from the focal center, doesn’t consistently garner significant attention in computer vision algorithms.
The paper seeks to reconcile the disconnect between human and machine perception by reviewing DNNs’ performance levels in tasks that are subject to peripheral vision constraints compared to human perception. While existing strategies for modelling peripheral vision in DNNs are fragmented and often reliant on specific architectures, loss of resolution models or style transfer techniques, these methods don’t fully capture peripheral vision’s complexity. This includes crowding effects and susceptibility to clutter. The researchers adjusted the existing TTM, a reliable model of human peripheral vision, to make it more amenable to use with DNNs, generating a new Uniform Texture Tiling Model (uniformTTM). This enabled them to generate images that embody the information present in human peripheral vision and utilise it to train and evaluate DNNs.
By combing the Uniform Texture Tiling Model with a large dataset known as COCO and transforming the images within it in a way that simulated peripheral vision, the research team was able to establish a new dataset, named COCO-Periph. This resource allowed them to compare the peripheral object detection performance of both humans and DNNs through core psychophysics experiments. They discovered that while DNNs that been trained with COCO-Periph demonstrated improvement compared with pre-trained models, their performance still fell short of human levels, particularly regarding sensitivity to clutter. However, those DNNs trained on COCO-Periph also showed fractional upturns in corruption robustness, suggesting a possible correlation between peripheral vision and adversarial robustness.
The paper emphasises the value of accurately modelling peripheral vision within DNNs in order to simulate the characteristics of human visual processing. The methods outlined in the paper, using uniformTTM and the COCO-Periph dataset, offer a significant step towards this goal, however, there is still work to be done to bridge the performance gap between humans and DNNs. The experiments underscore the need to adapt DNNs for use across multiple tasks and to further examine the connection between peripheral vision and robustness. Despite the work still to do, this research lays crucial groundwork for progress in multiple areas, such as driver safety, foveated rendering, designing UI/UX, content memorability and compression, where an accurate model of human-like visual perception could significantly enhance machine performance.