Researchers at Massachusetts Institute of Technology (MIT) have developed an image dataset to simulate peripheral vision in artificial intelligence (AI) models. This step is aimed at helping such models detect approaching dangers more effectively, or predict whether a human driver would take note of an incoming object.
Peripheral vision in humans allows us to see shapes that are not directly in our sight line, albeit in less detail – a feature lacking in AI. Optimising AI models with this ability could enhance driver safety, support the creation of displays that are easier to view, and assist researchers in predicting human behaviour more accurately given that peripheral vision can trigger the eyes to move to collect more information.
To achieve this, the MIT researchers took a traditionally-used technique in modelling human peripheral vision known as the texture tiling model, which transforms images to signify a human’s loss of visual data. The researchers then slightly altered this model so it could transform images similarly without prespecifying where the subject or AI would direct their gaze.
The modified technique was used to generate a large dataset of transformed images, which represent the loss of detail when a human looks into their periphery. The dataset was then used to train several computer vision models and their performances were compared with humans in an object detection task.
It was found that while training models from the outset using this dataset led to the most significant performance enhancements, improving their capacity to detect and recognise objects, in every case the machines were not as successful as humans, particularly in detecting objects in the far periphery.
MIT researchers are set to further explore these differences in their quest to find a model that can predict human performance in the visual periphery. The ultimate aim is to create AI systems that alert drivers to hazards that they may not see. The researchers also hope to inspire other researchers to carry out additional computer vision studies with their publicly available dataset.
The research was supported in part by the Toyota Research Institute and the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) METEOR Fellowship. Furthermore, it will be presented at the International Conference on Learning Representations.