A novel approach permits AI chatbots to communicate continuously without experiencing system failures.

Researchers from MIT and other locations have developed a solution to an issue with chatbot performance deterioration following continuous dialogue with a human – a problem attributed to the memory degradation in large language machine-learning models. Their solution, termed StreamingLLM, works by retaining key data points in the memory cache, enabling a chatbot to continue a conversation, regardless of its length, without experiencing a decrease in performance.

Large language models incorporate words from user queries into representations known as tokens. These tokens are utilised in an attention mechanism to generate new text. As the model experiences longer dialogues, more tokens are generated and stored in a memory cache known as the Key-Value (KV) Cache. Yet, larger caches can slow down computation and performance because once the cache’s memory limit is exceeded, it removes the oldest tokens, creating an abrupt drop in performance.

To counter this, the researchers proposed maintaining the first tokens in the cache despite memory limitations. Although seemingly illogical, they found that models tend to designate the first token as an “attention sink”, allocating remaining attention scores not used by unrelated tokens to the first token. The first token must, therefore, be safeguarded to preserve the model’s operation.

Applying this understanding, the research team designed StreamingLLM, finding that holding four attention sink tokens at the beginning of the cache optimises performance. Additionally, they discovered the importance of maintaining a token’s positional coding in the KV Cache despite other tokens being pushed out.

When compared to another commonly used method, StreamingLLM outperformed it significantly. For example, in a cache of 256 tokens, a popular recomputation method took roughly twice the time to decode a new token in comparison to StreamingLLM. If the cache size expands to 4096 tokens, the difference in decoding speed is even more stark.

The development of StreamingLLM is deemed promising by other academics in the field due to its implications for a wide range of AI applications. Notably, it has already proven successful in the deployment of the conversational AI model, Mistral, on iPhones. The model’s shortfalls, including an inability to recall evicted tokens or past conversations, will be the focus of future research. NVIDIA has adopted StreamingLLM into its large language model optimisation library known as TensorRT-LLM, underscoring the method’s potential. The project was funded by the MIT-IBM Watson AI Lab, the MIT Science Hub, and the U.S. National Science Foundation.

All
Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All
Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All
Categories

News(748)

Research(613)

School of Engineering(648)

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

A novel approach permits AI chatbots to communicate continuously without experiencing system failures.

Leave a comment Cancel reply

You May Also Like

An innovative method enables AI chatbots to engage in conversations all day without experiencing errors or shutdowns.

A cryptocurrency enthusiast secured a loan of $1.7 million by providing a NFT of Supreme T-shirts as a guarantee.

+60 12-462 2768

All Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All Categories

News(748)

Research(613)

School of Engineering(648)

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

A novel approach permits AI chatbots to communicate continuously without experiencing system failures.

Leave a comment Cancel reply

You May Also Like

An innovative method enables AI chatbots to engage in conversations all day without experiencing errors or shutdowns.

A cryptocurrency enthusiast secured a loan of $1.7 million by providing a NFT of Supreme T-shirts as a guarantee.

+60 12-462 2768

All
Categories

All
Categories

All
Categories