Skip to content Skip to sidebar Skip to footer

Computer vision

Scientists improve side vision capabilities in AI systems.

A team from MIT has created an image dataset aimed at simulating peripheral vision in machine learning models, a characteristic which AI typically lacks. This could improve the models' ability to recognise approaching threats and predict whether a human driver would spot an oncoming object. In experiments, these models improved in terms of hazard detection,…

Read More

DeepStack: Boosting Multimodal Structures using Layered Visual Token Assimilation for Exceptional High-Resolution Outcomes

Researchers from Fudan University and Microsoft have developed a novel architecture for language and vision models (LMMs), called "DeepStack." The DeepStack model takes a different approach to processing visual data, thereby improving overall computational efficiency and performance. Traditional LMMs typically integrate visual and textual data by converting images into visual tokens, which are then processed…

Read More

Researchers utilize extensive language models to assist robots with navigation.

Researchers from MIT and the MIT-IBM Watson AI Lab have developed a language-based navigational strategy for AI robots. The method uses textual descriptions instead of visual information, effectively simplifying the process of robotic navigation. Visual data traditionally requires significant computational capacity and detailed hand-crafted machine-learning models to function effectively. The researchers' approach involves converting a…

Read More

A novel approach to computer vision accelerates the screening process of electronic components.

Solar cells, transistors, LEDs, and batteries with boosted performance require better electronic materials which are often discovered from novel compositions. Scientists have turned to AI tools to identify potential materials from millions of chemical formulations, with engineers developing machines that can print hundreds of samples at a time, based on compositions identified by AI algorithms.…

Read More

Searching for a particular activity in a video? This method, powered by artificial intelligence, can locate it for you.

Researchers from MIT and the MIT-IBM Watson AI Lab have introduced an efficient method to train machine-learning models to identify specific actions in videos by making use of the video's automatically generated transcripts. The method, known as spatio-temporal grounding, helps the model intricately understand the video by dissecting it and analysing it through the lens…

Read More

The diffusion control model can alter the characteristics of the material present in pictures.

A team of researchers from the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) and Google Research have developed an image-to-image diffusion model called Alchemist, which allows users to modify the material properties of objects in photos. The system adjusts aspects such as roughness, metallicity, innate color (albedo), and transparency, and can be applied to…

Read More