Skip to content Skip to sidebar Skip to footer

Language Model

Huawei’s AI paper presents a new theoretical structure centered on the memory process and performance fluctuations of Transformer-oriented language models (LMs).

Transformer-based neural networks have demonstrated remarkable capabilities in tasks such as text generation, editing and answering questions. These networks often improve as their parameters increase. Notably, some models perform optimally when small, like the 2B model MiniCPM, which fares comparably to larger models. Yet as computational resources for training these models increase, high-quality data availability…

Read More

This research document on Artificial Intelligence from Huawei presents a theoretical structure centered on the memorization and performance dynamics of Transformer-based language models.

Transformer-based neural networks have demonstrated proficiency in a variety of tasks, such as text generation, editing, and question-answering. Perplexity and end task accuracy measurements consistently show models with more parameters perform better, leading industries to develop larger models. However, in some cases, larger models do not guarantee superior performance. The 2 billion parameter model, MiniCPM,…

Read More

A comparative investigation of LoRA and Full Finetuning in large language models was carried out by researchers associated with Columbia University and Databricks.

Researchers from Columbia University and Databricks Mosaic AI have conducted a comparative study of full finetuning and Low-Rank Adaptation (LoRA), a parameter-efficient finetuning method, in large language models (LLMs). The efficient finetuning of LLMs, which can contain billions of parameters, is an ongoing challenge due to the substantial GPU memory required. This makes the process…

Read More

This Artificial Intelligence research article from Stanford University assesses the effectiveness of multi-modal foundational models as they scale from limited-shot to extensive in-context learning (ICL).

Recent research suggests that incorporating demonstrating examples, or in-context learning (ICL), significantly enhances large language models' (LLM's) and large multimodal models' (LMM's) performance. Studies have shown improvements in LLM performance with increased in-context examples, particularly in out-of-domain tasks. These findings are driven by newer models such as GPT-4o and Gemini 1.5 Pro, which include longer…

Read More

Comparing GPT-4 and GPT-4o: An Overview of Major Changes and Comparative Study

The world of artificial intelligence (AI) and machine learning continues to evolve at a rapid pace, with OpenAI leading the charge. Their latest development is the introduction of GPT-4o, an optimized version of the widely used GPT-4, part of the Generative Pre-trained Transformer model series renowned for its natural language processing capabilities. GPT-4 boasts enhanced contextual…

Read More

01.AI has launched its improved model, Yi-1.5-34B, a more advanced version of the original Yi. It boasts a high-quality corpus with 500 billion tokens and has been meticulously adjusted using 3 million diverse fine-tuning samples.

The world of Artificial Intelligence (AI) has taken another step forward with the introduction of the recent Yi-1.5-34B model by 01.AI. This model is considered a significant upgrade over prior versions, providing a bridge between the capabilities of the Llama 3 8B and the 70B models. The distinguishing features of the Yi-1.5-34B include improvements in multimodal…

Read More

SpeechVerse: An AI Framework Built with Multiple Modes allowing LLMs to Comprehend and Carry Out a Wide Range of Speech-processing Tasks via Natural Language Commands.

Large language models (LLMs) have been successful in areas like natural language tasks and following instructions, yet they have limitations when dealing with non-textual data such as images and audio. But presently, an approach integrating textual LLMs with speech encoders in one training setup could revolutionize this. One option is multimodal audio-language models, proving advantageous…

Read More

This study by Google’s DeepMind examines the disparity in performance between online and offline techniques for aligning AI.

The standard method for aligning Language Learning Models (LLMs) is known as RLHF, or Reinforcement Learning from Human Feedback. However, new developments in offline alignment methods - such as Direct Preference Optimization (DPO) - challenge RLHF's reliance on on-policy sampling. Unlike online methods, offline algorithms use existing datasets, making them simpler, cheaper, and often more…

Read More

Meta AI presents Chameleon: A novel range of preliminary fusion token-based foundational models that establish a fresh benchmark for multimodal machine learning.

Recent multimodal foundation models are often limited in their ability to fuse various modalities, as they typically utilize distinct encoders or decoders for each modality. This structure limits their capability to effectively integrate varied content types and create multimodal documents with interwoven sequences of images and text. Meta researchers, in response to this limitation, have…

Read More

Stanford and UC Berkeley’s AI Research highlights the evolution of ChatGPT’s conduct over time.

Large Language Models (LLMs) such as GPT 3.5 and GPT 4 have recently garnered substantial attention in the Artificial Intelligence (AI) community for their ability to process vast amounts of data, detect patterns, and simulate human-like language in response to prompts. These LLMs are capable of self-improvement over time, drawing upon new information and user…

Read More