Introducing Consistency Large Language Models (CLLMs): A Unique Group of LLMs Specifically Tailored for the Jacobi Decoding Approach to Reduce Latency

Large language models (LLMs) such as GPT-4, LLaMA, and PaLM are playing a significant role in advancing the field of artificial intelligence. However, the attention mechanism of these models relies on generating one token at a time, thus leading to high latency. To address this, researchers have proposed two approaches to efficient LLM inference, with one requiring additional training and the second not needing it.

These approaches involve using Knowledge Distillation (KD) for autoregressive LLMs, aiming to minimize the reverse KL divergence between student and teacher models. Nevertheless, conventional KD methods were found to be ineffective for LLMs. Hence, researchers from Shanghai Jiao University and the University of California proposed a more effective family of LLMs known as Consistency Large Language Models (CLLMs).

The CLLMs are designed for the Jacobi decoding method, which specialises in reducing latency. Unlike previous approaches, CLLMs don’t require additional memory to adjust auxiliary model parts. They also outperform other methods such as speculative decoding and Medusa. On training on approximately 1M tokens for LLaMA-7B, CLLMs proved to be 3.4 times faster on the Spider dataset. This speed-up is attributed to the simultaneous prediction of several tokens in a single forward pass (fast forwarding), and the accurate prediction of stationary tokens which remain unchanged despite being preceded by inaccurate tokens.

The research finding indicates that for both fast-forwarded and stationary tokens, there’s an improvement of 2.0x to 6.8x across all four datasets. Furthermore, CLLMs perform better on domain-specific datasets than on open-domain datasets profiled on MT-bench.

Tests were performed to evaluate the performance and inference speedup of CLLMs across multiple tasks. The results showed CLLMs could achieve 2.4× to 3.4× speedup using Jacobi decoding with nearly no accuracy loss on domain-specific benchmarks such as GSM8K, CodeSearchNet Python, and Spider. On ShareGPT, CLLMs achieved 2.4x speedup with a similar performance level, with a score of 6.4 on the open-domain benchmark MT-bench.

To sum up, the researchers introduced CLLMs as a new family of LLMs that significantly enhance the efficiency of Jacobi decoding. By leveraging a pre-trained LLM, CLLMs reduce the complexity of managing two different models within a system. They also improve the efficiency in generating token counts across different datasets. The researchers’ work on CLLMs represents a significant advancement in the field of AI and language model development.

All
Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All
Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All
Categories

News(748)

Research(613)

School of Engineering(648)

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Introducing Consistency Large Language Models (CLLMs): A Unique Group of LLMs Specifically Tailored for the Jacobi Decoding Approach to Reduce Latency

Leave a comment Cancel reply

You May Also Like

Improving Safety and Productivity: The Essential Function of AI in Sophisticated Cryptocurrency Systems

LlamaFactory: An Integrated Platform for Machine Learning that Consolidates a Range of Advanced Training Techniques, Facilitating User Personalization on the Precise Adjustment of Over 100 Language Learning Models (LLMs) in a Flexible Manner.

+60 12-462 2768

All Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All Categories

News(748)

Research(613)

School of Engineering(648)

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Introducing Consistency Large Language Models (CLLMs): A Unique Group of LLMs Specifically Tailored for the Jacobi Decoding Approach to Reduce Latency

Leave a comment Cancel reply

You May Also Like

Improving Safety and Productivity: The Essential Function of AI in Sophisticated Cryptocurrency Systems

LlamaFactory: An Integrated Platform for Machine Learning that Consolidates a Range of Advanced Training Techniques, Facilitating User Personalization on the Precise Adjustment of Over 100 Language Learning Models (LLMs) in a Flexible Manner.

+60 12-462 2768

All
Categories

All
Categories

All
Categories