Researchers from Capital Normal University and the School of Artificial Intelligence at Beijing University of Posts and Telecommunications have developed RealNet, a new feature reconstruction framework for industrial image anomaly detection. This approach addresses ongoing issues with generating diverse, realistic anomalies that align with natural distributions, as well as challenges around feature redundancy and pre-training…
Tyler Perry, an acclaimed film producer, recently revealed that he has postponed his $800 million expansion plans for his Atlanta studio indefinitely. The decision comes in the wake of OpenAI's latest technological innovation, a text-to-video model called Sora.
Initially unveiled on February 15, 2024, OpenAI's Sora allows users to convert text prompts into video images. This…
US-based startup Cognition has introduced Devin, the world's first fully autonomous AI software engineer on March 17, 2024. Devin harnesses AI power capable of resolving engineering tasks independently with its built-in shell, code editor, and web browser.
One of the key features of Devin is its proficiency in fixing bugs on GitHub autonomously. Cognition has demonstrated…
A research team from the Korea Advanced Institute of Science and Technology (KAIST) has contributed to the field of machine interpretation and interaction which amalgamates AI’s language understanding and visual perception, with the development of MoAI. The model utilizes auxiliary visual information from specialized computer vision (CV) models, which provides a more nuanced understanding of…
Visual Language Models (VLMs), which are powerful tools for processing visual and textual data, can face difficulties due to limited data availability. Recent research developments have shown that pre-training these models on larger image-text datasets can enhance their performance in downstream tasks. However, creating these datasets can be challenging because of paired data scarcity, high…
In a move that aligns with its AI focus, Apple has acquired DarwinAI, an AI-focused Canadian startup. The purchase of the startup, which is yet to be officially announced by Apple, purportedly occurred earlier this year. DarwinAI's strength lies in its development of AI systems for visual inspection of components during manufacturing processes. The company's…
Image generation from textual descriptions has revolutionized the way technology intersects with creativity. A domain that has garnered interest is subject-driven image generation. Its potential lies in creating personalized images of specific subjects from a minimal set of examples. Yet, the inability to fully capture and depict detailed attributes of a given subject within its…
On March 14th, 2024, two teenage students from Miami, Florida, aged 13 and 14, were arrested for allegedly creating and sharing explicit images of their classmates using artificial intelligence (AI). The juveniles, who were students at Pinecrest Cove Academy, reportedly used an unnamed AI application to generate and circulate the non-consensual pictures of their peers,…
In the field of 3D generative AI, a new dimension has emerged whereby 3D reconstruction can occur from limited views. Propelled by large-scale 3D datasets and advances in generative model topologies, research has been spearheaded into using 2D diffusion models to create 3D objects from input texts or photos. This is primarily to address the…
The creation of lifelike images, videos, and sounds using artificial intelligence (AI) has significantly progressed recently. However, most of these developments have been focused on single modalities, ignoring the inherent multimodal nature of our world. In addressing this, researchers have introduced a novel optimization-based framework designed to seamlessly integrate visual and audio content creation. By…
Perplexity AI, a startup launched in August 2022, is aspiring to compete with Google in the search engine sector. The company's technology merges the capabilities of a chatbot and a traditional search engine, and its innovation is gaining investment from figures including Amazon founder, Jeff Bezos. In the first few months of 2024, Perplexity AI…
Modern vision-language models (VLMs) have made significant progress in providing solutions for multimodal tasks by merging the reasoning abilities of large language models (LLMs) and visual encoders like ViT. Nevertheless, despite their impressive performance in tasks involving entire images, these models often struggle with the fine-grained region grounding, inter-object spatial relations, and compositional reasoning. They…