Multimodal large language models (MLLMs) are crucial tools for combining the capabilities of natural language processing (NLP) and computer vision, which are needed to analyze visual and textual data. Particularly useful for interpreting complex charts in scientific, financial, and other documents, the prime challenge lies in improving these models to understand and interpret charts accurately.…
In the rapidly advancing field of Artificial Intelligence (AI), evaluating the outputs of models accurately becomes a complex task. State-of-the-art AI systems such as GPT-4 are using Reinforcement Learning with Human Feedback (RLHF) which implies human judgement is used to guide the training process. However, as AI models become intricate, even experts find it challenging…
The field of research that aims to enhance large multimodal models (LMMs) to effectively interpret long video sequences faces challenges stemming from the extensive amount of visual tokens vision encoders generate. These visual tokens pile up, particularly with LLaVA-1.6 model, which generates between 576 and 2880 visual tokens for one image, a number that significantly…
Researchers from Carnegie Mellon University and Google's DeepMind have developed a novel approach for training visual-language models (VLMs) called In-Context Abstraction Learning (ICAL). Unlike traditional methods, ICAL guides VLMs to build multimodal abstractions in new domains, allowing machines to better understand and learn from their experiences.
This is achieved by focusing on four cognitive abstractions,…
Prompt engineering is an essential tool in optimizing the potential of AI language models like ChatGPT. It involves the intentional design and continuous refinement of input prompts to direct the model's output. The strength of a prompt greatly affects the AI's ability to provide relevant and coherent responses, assisting the model in understanding the context…
Last summer, MIT President Sally Kornbluth and Provost Cynthia Barnhart invited researchers to submit papers that lay out effective strategies, policy recommendations, and urgent actions within the field of generative artificial intelligence (AI). Among the 75 received proposals, 27 were selected for seed funding.
Impressed by the level of interest and the quality of ideas,…
Group Relative Policy Optimization (GRPO) is a recent reinforcement learning method introduced in the DeepSeekMath paper. Developed as an upgrade to the Proximal Policy Optimization (PPO) framework, GRPO aims to improve mathematical reasoning skills while lessening memory use. This technique is especially suitable for functions that require sophisticated mathematical reasoning.
The implementation of GRPO involves several…
Artificial intelligence (AI) is growing at a rapid pace, giving rise to a branch known as AI agents. These are sophisticated systems capable of executing tasks autonomously within specific environments, using machine learning and advanced algorithms to interact, learn, and adapt. The burgeoning infrastructure supporting AI agents involves several notable projects and trends that are…
Scientists at Sierra presented τ-bench, an innovative benchmark intended to test the performance of language agents in dynamic, realistic scenarios. Current evaluation methods are insufficient and unable to effectively assess if these agents are capable of interacting with human users or comply with complex, domain-specific rules, all of which are crucial for practical implementation. Most…
The field of software engineering has made significant strides with the development of Large Language Models (LLMs). These models are trained on comprehensive datasets, allowing them to efficiently perform a myriad of tasks which comprise of code generation, translation, and optimization. LLMs are increasingly being employed for compiler optimization. However, traditional code optimization methods require…
The impact of Artificial Intelligence (AI) has been steadily growing, which has led to the development of Large Language Models (LLMs). Engaging with AI literature is a good way to keep up with its advancements. Here are the top AI books to read in 2024:
1. "Deep Learning (Adaptive Computation and Machine Learning series)": This book…
MIT President, Sally Kornbluth, and Provost, Cynthia Barnhart, issued a call for papers last summer regarding “effective roadmaps, policy recommendations, and calls for action” in the field of generative AI. From the 75 proposals they received, 27 were chosen for seed funding. Following the enormous response, a second call for proposals was launched which resulted…