A novel approach permits AI chatbots to engage in conversation throughout the day without experiencing failures.

Researchers from MIT and other institutions have discovered the key to why AI chatbot conversations can break down and developed a solution that enables continuous dialogue. The issue lies in the chatbot’s key-value cache (akin to a conversational memory). In some models, earlier data points are discarded when the cache reaches its limit, causing the bot to fail. The researchers’ solution, StreamingLLM, ensures that these initial data pieces remain in memory, allowing extended chat sessions.

StreamingLLM helps a model to perform at its best, even in long conversations exceeding four million words. It proved more efficient than another method that avoids crashing by continuously recalculating portions of past dialogues, operating over 22 times faster. This breakthrough could let AI chatbots maintain day-long conversations without requiring frequent reboots, making them more efficient assistants for tasks such as copywriting, editing, or generating code.

The researchers found that large language models code information like user query words into tokens and build “attention maps” with an attention mechanism to measure how strongly each word relates to others. When the cache becomes too large, these attention maps can slow computation and can degrade the model’s performance if encoding content requires more tokens than the cache can hold.

Researchers have previously used a “sliding cache” approach, whereby it replaces older tokens with new ones. However, this strategy can lead to a substantial decrease in performance once the first token is ousted. The team discovered that keeping the first token, or the “attention sink,” in the sliding cache maintains model performance even when cache limits are exceeded. They also found that for optimal performance, the streaming cache should carry four attention sink tokens at the start.

Furthermore, they found that each token’s positional encoding must remain constant, even as the cache changes. For example, token 6 must retain its number, even when token 5 is evicted, and it becomes the fifth token in cache.

By implementing these two ideas, StreamingLLM could sustain a never-ending conversation and outperformed a popular method employing recomputation. For instance, with a 256-token cache, StreamingLLM took only 31 milliseconds to decode a new token, compared to 63 milliseconds for the recomputation method. At a cache size of 4,096 tokens, StreamingLLM required just 65 milliseconds, while recomputation needed 1,411 milliseconds.

Despite the advances, StreamingLLM cannot remember words outside the cache. The researchers aim to address this shortcoming and allow the model to recall previous conversations in future research. StreamingLLM has already been integrated into NVIDIA’s large language model optimization library, TensorRT-LLM.

All
Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All
Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All
Categories

News(748)

Research(613)

School of Engineering(648)

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

A novel approach permits AI chatbots to engage in conversation throughout the day without experiencing failures.

Leave a comment Cancel reply

You May Also Like

This AI Document from KAIST AI Introduces ORPO: Taking Preference Alignment in Language Models to Unprecedented Levels.

Discussion: Chad Sanderson, Chief Executive Officer of data contracts platform, Gable.ai.

+60 12-462 2768

All Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All Categories

News(748)

Research(613)

School of Engineering(648)

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

A novel approach permits AI chatbots to engage in conversation throughout the day without experiencing failures.

Leave a comment Cancel reply

You May Also Like

This AI Document from KAIST AI Introduces ORPO: Taking Preference Alignment in Language Models to Unprecedented Levels.

Discussion: Chad Sanderson, Chief Executive Officer of data contracts platform, Gable.ai.

+60 12-462 2768

All
Categories

All
Categories

All
Categories