


A team from MIT has created an image dataset aimed at simulating peripheral vision in machine learning models, a characteristic which AI typically lacks. This could improve the models' ability to recognise approaching threats and predict whether a human driver would spot an oncoming object. In experiments, these models improved in terms of hazard detection,…

Researchers from Fudan University and Microsoft have developed a novel architecture for language and vision models (LMMs), called "DeepStack." The DeepStack model takes a different approach to processing visual data, thereby improving overall computational efficiency and performance.
Traditional LMMs typically integrate visual and textual data by converting images into visual tokens, which are then processed…

Researchers from MIT and the MIT-IBM Watson AI Lab have developed a language-based navigational strategy for AI robots. The method uses textual descriptions instead of visual information, effectively simplifying the process of robotic navigation. Visual data traditionally requires significant computational capacity and detailed hand-crafted machine-learning models to function effectively. The researchers' approach involves converting a…

Solar cells, transistors, LEDs, and batteries with boosted performance require better electronic materials which are often discovered from novel compositions. Scientists have turned to AI tools to identify potential materials from millions of chemical formulations, with engineers developing machines that can print hundreds of samples at a time, based on compositions identified by AI algorithms.…






Researchers from MIT and the MIT-IBM Watson AI Lab have introduced an efficient method to train machine-learning models to identify specific actions in videos by making use of the video's automatically generated transcripts. The method, known as spatio-temporal grounding, helps the model intricately understand the video by dissecting it and analysing it through the lens…

A team of researchers from the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) and Google Research have developed an image-to-image diffusion model called Alchemist, which allows users to modify the material properties of objects in photos. The system adjusts aspects such as roughness, metallicity, innate color (albedo), and transparency, and can be applied to…