Over two millennia ago, Greek mathematician Euclid laid the groundwork for the modern understanding of geometry. Today, that work serves as the bedrock for researchers like Justin Solomon, who uses geometry to address complex problems - many of which seem unrelated to shapes at first glance. Solomon is an associate professor at MIT's Department of…
In the fast-paced digital world, the integration of visual and textual data for advanced video comprehension has emerged as a key area of study. Large Language Models (LLMs) play a vital role in processing and generating text, revolutionizing the way we engage with digital content. But, traditionally, these models are designed to be text-centric, and…
Mathematician Justin Solomon is using modern geometric techniques to solve complex problems, often unrelated to shapes. He explains that geometric tools can be used to compare datasets, providing insight into the performance of machine-learning models. He asserted the significance of distance, similarity, curvature, and shape, all derived from geometry, in discussing data.
His Geometric Data…
Over two millennia ago, the ancient mathematician Euclid, widely recognized as the father of geometry, shifted our perspective on shapes. Today, Justin Solomon of MIT uses contemporary geometric methods to tackle complex challenges seemingly unrelated to shapes. Solomon utilizes geometric tools to analyze high-dimensional datasets, providing insights about the potential performance of machine learning models.…
ST-LLM: An Efficient Video-LLM Framework Incorporating Spatial-Temporal Sequence Modeling within LLM
Artificial general intelligence has advanced significantly, thanks in part to the capabilities of Large Language Models (LLMs) such as GPT, PaLM, and LLaMA. These models have shown impressive knowledge and generation of natural language, highlighting the direction of future AI. However, while LLMs excel at text processing, video processing with complex temporal information remains a…
Greek mathematician Euclid revolutionized the concept of shapes over two millennia ago, laying a strong foundation for geometry. Justin Solomon, leveraging his ancient principles with modern geometric techniques, is solving complex issues unrelated to shapes.
Solomon, an associate professor at MIT Department of Electrical Engineering and Computer Science (EECS) and a member of the Computer Science…
Justin Solomon is an associate professor in the MIT Department of Electrical Engineering and Computer Science and a member of the Computer Science and Artificial Intelligence Laboratory who is using geometric techniques to solve complex problems in data science and artificial intelligence, among other areas. These techniques draw upon the geometric structures within datasets to…
Over 2,000 years after Euclid's groundbreaking work in geometry, MIT associate professor Justin Solomon is using the ancient principles in fresh, modern ways. Solomon's work in the Geometric Data Processing Group applies geometry to solve a variety of problems, from comparing datasets in machine learning to enhancing generative AI models. His work assumes a variety…
Researchers from New York University, ELLIS Institute, and the University of Maryland have developed a model, known as Contrastive Style Descriptors (CSD), that enables a more nuanced understanding of artistic styles in digital artistry. This has been done with the aim of deciphering whether generative models like Stable Diffusion and DALL-E are merely replicating existing…
Machine learning researchers have developed a cost-effective reward mechanism to help improve how language models interact with video data. The technique involves using detailed video captions to measure the quality of responses produced by video language models. These captions serve as proxies for actual video frames, allowing language models to evaluate the factual accuracy of…
Google researchers have developed a new streaming dense video captioning model which aims to improve on previous methods by enabling localized identification of events within a video and real-time generation of appropriate captions for them. Existing practices are hindered by limited frame processing, causing incomplete or inadequate video descriptions.
The existing dense video captioning models have…
