Natural Language Processing (NLP) aims to enable computers to understand and generate human language, facilitating human-computer interaction. Despite advancements in NLP, large language models (LLMs) often fall short when it comes to complex planning tasks, such as decision-making and organizing actions - abilities crucial in a diverse array of applications from daily tasks to strategic…
Natural Language Processing (NLP) faces major challenges in addressing the limitations of decoder-only Transformers, which are the backbone of large language models (LLMs). These models contend with issues like representational collapse and over-squashing, which severely hinder their functionality. Representational collapse happens when different sequences produce nearly the same results, while over-squashing occurs when the model…
Fusion oncoproteins, proteins formed by chromosome translocations, play a critical role in many cancers, especially those found in children. However, due to their large and disordered structures, they are difficult to target with traditional drug design methods. To tackle this challenge, researchers at Duke University have developed FusOn-pLM, a novel protein language model specifically tailored…
Sampling from complex and high-dimensional target models, like the Boltzmann distribution, is critical in various spheres of science. Often, these models have to handle Combinatorial Optimization (CO) problems, which deal with finding the best solutions from a vast pool of possibilities. Sampling in such scenarios can get intricate due to the inherent challenge of obtaining…
Research conducted by institutions like FAIR, Meta AI, Datashape, and INRIA explores the emergence of Chain-of-Thought (CoT) reasoning in Language Learning Models (LLMs). CoT enhances the capabilities of LLMs, enabling them to perform complex reasoning tasks, even though they are not explicitly designed for it. Even as LLMs are primarily trained for next-token prediction, they…
Matrix multiplication (MatMul) is a fundamental process in most neural network topologies. It is commonly used in vector-matrix multiplication (VMM) by dense layers in neural networks, and in matrix-matrix multiplication (MMM) by self-attention mechanisms. Significant reliance on MatMul can be attributed to GPU optimization for these tasks. Libraries like cuBLAS and the Compute Unified Device…
Researchers have identified cultural accumulation as a crucial aspect of human success. This practice refers to our capacity to learn skills and accumulate knowledge over generations. However, currently used artificial learning systems, like deep reinforcement learning, frame the learning question as happening within a single "lifetime." This approach does not account for the generational and…
Stanford University researchers have developed a new method called Demonstration ITerated Task Optimization (DITTO) designed to align language model outputs directly with users' demonstrated behaviors. This technique was introduced to address the challenges language models (LMs) face - including the need for big data sets for training, generic responses, and mismatches between universal style and…