Skip to content Skip to sidebar Skip to footer

Large Language Model

LASP: A Streamlined Machine Learning Technique Specifically Designed for Linear Attention-Based Linguistic Models

Researchers from the Shanghai AI Laboratory and TapTap have developed a Linear Attention Sequence Parallel (LASP) technique that optimizes sequence parallelism on linear transformers, side-stepping the limitations led by the memory capacity of a single GPU. Large language models, due to their significant size and long sequences, can place a considerable strain on graphical unit…

Read More

SILO AI Unveils Upcoming Viking Model Family: A Freely Available Language Model for Nordic languages, English, and Coding Languages.

Artificial intelligence (AI) continues to make significant strides forward with the development of Viking, a cutting-edge language model designed to cater to Nordic languages alongside English and a range of programming languages. Developed by Silo AI, Europe's largest private AI lab in partnership with the TurkuNLP research group at the University of Turku and HPLT,…

Read More

NAVER Cloud’s research team presents HyperCLOVA X: A Multilingual Language Model specially designed for the Korean language and culture.

The development of large language models (LLMs) has historically been English-centric. While this has often proved successful, it has struggled to capture the richness and diversity of global languages. This issue is particularly pronounced with languages such as Korean, which boasts unique linguistic structures and deep cultural contexts. Nevertheless, the field of artificial intelligence (AI)…

Read More

Scientists from Intel Labs have unveiled LLaVA-Gemma, a compact vision-language module utilizing two versions of the Gemma Large Language Model, namely Gemma-2B and Gemma-7B.

Recent advancements in large language models (LLMs) and Multimodal Foundation Models (MMFMs) have sparked a surge of interest in large multimodal models (LMMs). LLMs and MMFMs, including models such as GPT-4 and LLaVA, have demonstrated exceptional performance in vision-language tasks, including Visual Question Answering and image captioning. However, these models also require high computational resources,…

Read More

Assessing AI Model Safety via Red Teaming Method: An In-depth Analysis of LLM and MLLM’s Resilience to Jailbreak Assaults and Prospective Enhancements

Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) are key advancements in artificial intelligence (AI) capable of generating text, interpreting images, and understanding complex multimodal inputs, mimicking human intelligence. However, concerns arise due to their potential misuse and vulnerabilities to jailbreak attacks, where malicious inputs trick the models into generating harmful or objectionable…

Read More

“AutoTRIZ: A Creative AI Instrument that Utilizes Extensive Language Models (LLMs) for the Automation and Improvement of the TRIZ (Innovative Problem-solving) Approach”

The Theory of Inventive Problem Solving (TRIZ) is a widely recognized method of ideation that uses the knowledge derived from a large, ongoing patent database to systematically invent and solve engineering problems. TRIZ is increasingly incorporating various aspects of machine learning and natural language processing to enhance its reasoning process. Now, researchers from both the Singapore…

Read More

Stanford University researchers have unveiled Octopus v2, a tool that enhances on-device language models for improved super agent operations.

Artificial intelligence, particularly large language models (LLMs), faces the critical challenge of balancing model performance and practical constraints such as privacy, cost, and device compatibility. Large cloud-based models that offer high-accuracy rely on constant internet connectivity, raising potential issues of privacy breaches and high costs. Deploying these models on edge devices introduces further challenges in…

Read More

Alibaba-Qwen presents Qwen1.5 32B, a fresh multilingual dense Language Model that stands out with a context of 32k and surpasses Mixtral on the Open Language Model Leaderboard.

Alibaba's AI research division continues to establish a strong presence in the field of large language models (LLMs) with its new Qwen1.5-32B model, which features 32 billion parameters and an impressive 32k token context size. This latest addition to the Qwen series epitomizes Alibaba's commitment to high-performance computing balanced with resource efficiency. The Qwen1.5-32B has superseded…

Read More

Poro 34B: An AI Model with a 34B Parameter, Developed for 1 Trillion Tokens Including English, Finnish, and Programming languages, with a Special Focus on 8 Billion Tokens of Finnish-English Translation Pairs.

The increasingly sophisticated language models of today need vast quantities of text data for pretraining, often in the order of trillions of words. This poses a considerable problem for smaller languages that lack the necessary resources. To tackle this issue, researchers from the TurkuNLP Group, the University of Turku, Silo AI, the University of Helsinki,…

Read More

The ‘Self-Critique’ pipeline, an innovative approach to mathematical problem solving in broad language models, has been unveiled by scientists at Zhipu AI and Tsinghua University.

Large language models (LLMs) have received much acclaim for their ability to understand and process human language. However, these models tend to struggle with mathematical reasoning, a skill that requires a combination of logic and numeric understanding. This shortcoming has sparked interest in researching and developing methods to improve LLMs' mathematical abilities without downgrading their…

Read More

Google AI researchers have developed a new privacy-centric cascade system to improve the performance of machine learning models.

The concept of cascades in large language models (LLMs) has gained popularity for its high task efficiency while reducing data inference. However, potential privacy issues can arise in managing sensitive user information due to interactivity between local and remote models. Conventional cascade systems lack privacy-protecting mechanisms, causing sensitive data to be unintentionally transferred to the…

Read More