Neural Architecture Search (NAS) is a process that utilizes machine learning to automate the design of neural networks. This development has marked a significant shift from traditional manual design processes and is considered pivotal in paving the way for future advancements in autonomous machine learning. Despite these benefits, adopting NAS in the past has been…
Deep Neural Networks (DNNs) have demonstrated substantial prowess in improving surgical precision by accurately identifying robotic instruments and tissues through semantic segmentation. However, DNNs grapple with catastrophic forgetting, signifying a rapid performance decline on previously learned tasks when new ones are introduced. This poses significant problems, especially in cases where old data is not accessible…
Artificial intelligence possesses large language models (LLMs) like GPT-4 that enable autonomous agents to carry out complex tasks within various environments with unprecedented accuracy. However, these agents still struggle to learn from failures, which is where the Exploration-based Trajectory Optimization (ETO) method comes in. This training introduced by the Allen Institute for AI; Peking University's…
Despite remarkable advances in large language models (LLMs) like ChatGPT, Llama2, Vicuna, and Gemini, these platforms often struggle with safety issues. These problems often manifest as the generation of harmful, incorrect, or biased content by these models. The focus of this paper is on a new safety-conscious decoding method, SafeDecoding, that seeks to shield LLMs…
The field of large language models (LLMs) has witnessed significant advances thanks to the introduction of State Space Models (SSMs). Offering a lower computational footprint, SSMs are seen as a welcome alternative. The recent development of DenseSSM represents a significant milestone in this regard. Designed by a team of researchers at Huawei's Noah's Ark Lab,…
The rapid development in Large Language Models (LLMs) has seen billion- or trillion-parameter models achieve impressive performance across multiple fields. However, their sheer scale poses real issues for deployment due to severe hardware requirements. The focus of current research has been on scaling models to improve performance, following established scaling laws. This, however, emphasizes the…
Large Language Models (LLMs) such as GPT-4 and Llama-2, while highly capable, require fine-tuning with specific data tailored to various business requirements. This process can expose the models to safety threats, most notably the Fine-tuning based Jailbreak Attack (FJAttack). The introduction of even a small number of harmful examples during the fine-tuning phase can drastically…
The application of machine learning, particularly generative models, has lately become more prominent due to the advent of diffusion models (DMs). These models have proved instrumental in modeling complex data distributions and generating realistic samples in numerous areas, including image, video, audio, and 3D scenes. Despite their practical benefits, there are gaps in the full…
Multimodal Large Language Models (MLLMs), such as Flamingo, BLIP-2, LLaVA, and MiniGPT-4, enable emergent vision-language capabilities. Their limitation, however, lies in their inability to effectively recognize and understand intricate details in high-resolution images. To address this, scientists have developed InfiMM-HD, a new architecture specifically designed for processing images of varying resolutions at a lower computational…
The challenges associated with training large language models (LLMs) given their memory-intensive nature can be significant. Traditional methods for reducing memory consumption frequently involve compressing model weights, commonly leading to a decrease in model performance. A new approach being called Gradient Low-Rank Projection (GaLore) is now being proposed by researchers from various institutions, including the…