The research team at Microsoft has developed a new, more efficient method of teaching robots complex tasks. The new method, called Primitive Sequence Encoding (PRISE), enables machines to break down intricate activities into simpler tasks, and learn them step-by-step. This technique shows great potential for improving machines' overall learning capabilities and performance within a shorter…
Recent research from a team of Apple researchers has assessed the limitations of Vision-Language Models (VLMs). VLMs, including OpenAI's GPT4-V, have seen substantial improvements recently, showing impressive performance across various vision-language tasks. However, the researchers found a significant difference between the high performance of Large Language Models (LLMs) in text-based tasks and VLMs' capability in…
Large language models (LLMs) that excel in solving various problems often falter when it comes to complex mathematical reasoning tasks. This is attributed to the requirement of multi-step reasoning, a process facilitated by Instruction Tuning. The effectiveness of Instruction Tuning is, however, hampered by limited mathematical reasoning datasets. The scarcity of such datasets underscores the…
Language Learner Models (LLMs) are rapidly advancing, displaying impressive performance in math, science and coding tasks. This progress is in part due to advancements in Reinforcement Learning from Human Feedback (RLHF) and instruction fine-tuning, which align LLMs more closely with human behaviors and preferences. Moreover, innovative prompting strategies, like Chain-of-Thought and Tree-of-Thoughts, have augmented LLM…
The rise of diffusion models in the field of machine learning is making significant strides in modeling complex data distributions and generating realistic samples from various domains, such as images, videos, audio, and 3D scenes. Nevertheless, full theoretical comprehension of generative diffusion models continues to be a challenging frontier requiring a more elaborate understanding, particularly…
Gradient Low-Rank Projection (GaLore), a new method invented by researchers from California Institute of Technology, Meta AI, University of Texas at Austin, and Carnegie Mellon University, presents an innovative approach to tackle memory-intensive nature of training large language models (LLMs) by presenting an alternative to conventional method of model weight reduction which often results in…
Traditionally, machine learning models have been trained and tested on data from the same distribution. However, researchers have found that models perform more effectively when dealing with data from multiple distributions. This flexibility is often achieved through “rich representations,” surpassing the capabilities of models trained on traditional sparsity-inducing regularization or common stochastic gradient methods.
However, optimizing…
Researchers at Google DeepMind have developed a novel method, AtP*, for understanding the behaviors of large language models (LLMs). This groundbreaking technique stems from its predecessor, Attribution Patching (AtP), and preserves its central concept--attributing actions to specific model elements, while significantly refining the process in order to correct its inherent limitations.
The heart of AtP* involves…
Recent advancements in Artificial Intelligence and Deep Learning have facilitated significant progress in generative modeling, a subfield of Machine Learning where models produce new data that fits the learning data. These generative AI systems display incredible capabilities such as creating images from text descriptions and solving complex problems. Yet, there are restrictions in the current…
Large language models (LLMs) are powerful tools often used in tasks like code generation, language translation, writing unit tests, and debugging. Innovations such as CodeLlama, ChatGPT, and Codex have considerably improved the coding experience, with abilities like code manipulation. Even more, some models like AlphaCode are pretrained on competitive programming tasks to optimize code at…
A research team from the University of California Berkeley has developed a cutting-edge retrieval-augmented language model system designed for predictive forecasting. The system taps into abundant web-scale data and employs the quick parsing capabilities of language models (LMs), providing a scalable and efficient alternative to traditional forecasting methods, which often struggle with data scarcity or…