Researchers from the Shanghai AI Laboratory and TapTap have developed a Linear Attention Sequence Parallel (LASP) technique that optimizes sequence parallelism on linear transformers, side-stepping the limitations led by the memory capacity of a single GPU.
Large language models, due to their significant size and long sequences, can place a considerable strain on graphical unit…
Artificial intelligence (AI) continues to make significant strides forward with the development of Viking, a cutting-edge language model designed to cater to Nordic languages alongside English and a range of programming languages. Developed by Silo AI, Europe's largest private AI lab in partnership with the TurkuNLP research group at the University of Turku and HPLT,…
The development of large language models (LLMs) has historically been English-centric. While this has often proved successful, it has struggled to capture the richness and diversity of global languages. This issue is particularly pronounced with languages such as Korean, which boasts unique linguistic structures and deep cultural contexts. Nevertheless, the field of artificial intelligence (AI)…
Recent advancements in large language models (LLMs) and Multimodal Foundation Models (MMFMs) have sparked a surge of interest in large multimodal models (LMMs). LLMs and MMFMs, including models such as GPT-4 and LLaVA, have demonstrated exceptional performance in vision-language tasks, including Visual Question Answering and image captioning. However, these models also require high computational resources,…
Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) are key advancements in artificial intelligence (AI) capable of generating text, interpreting images, and understanding complex multimodal inputs, mimicking human intelligence. However, concerns arise due to their potential misuse and vulnerabilities to jailbreak attacks, where malicious inputs trick the models into generating harmful or objectionable…
The Theory of Inventive Problem Solving (TRIZ) is a widely recognized method of ideation that uses the knowledge derived from a large, ongoing patent database to systematically invent and solve engineering problems. TRIZ is increasingly incorporating various aspects of machine learning and natural language processing to enhance its reasoning process.
Now, researchers from both the Singapore…
Artificial intelligence, particularly large language models (LLMs), faces the critical challenge of balancing model performance and practical constraints such as privacy, cost, and device compatibility. Large cloud-based models that offer high-accuracy rely on constant internet connectivity, raising potential issues of privacy breaches and high costs. Deploying these models on edge devices introduces further challenges in…
Alibaba's AI research division continues to establish a strong presence in the field of large language models (LLMs) with its new Qwen1.5-32B model, which features 32 billion parameters and an impressive 32k token context size. This latest addition to the Qwen series epitomizes Alibaba's commitment to high-performance computing balanced with resource efficiency.
The Qwen1.5-32B has superseded…
The increasingly sophisticated language models of today need vast quantities of text data for pretraining, often in the order of trillions of words. This poses a considerable problem for smaller languages that lack the necessary resources. To tackle this issue, researchers from the TurkuNLP Group, the University of Turku, Silo AI, the University of Helsinki,…
Large language models (LLMs) have received much acclaim for their ability to understand and process human language. However, these models tend to struggle with mathematical reasoning, a skill that requires a combination of logic and numeric understanding. This shortcoming has sparked interest in researching and developing methods to improve LLMs' mathematical abilities without downgrading their…
The concept of cascades in large language models (LLMs) has gained popularity for its high task efficiency while reducing data inference. However, potential privacy issues can arise in managing sensitive user information due to interactivity between local and remote models. Conventional cascade systems lack privacy-protecting mechanisms, causing sensitive data to be unintentionally transferred to the…