The cognitive abilities of extensive linguistic models are frequently exaggerated.

Artificial intelligence (AI) and particularly large language models (LLMs) are not as robust at performing tasks in unfamiliar scenarios as they are positioned to be, according to a study by researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL).

The researchers focused on the performance of models like GPT-4 and Claude when handling “default tasks,” normal scenarios a model is trained and tested on, and “counterfactual scenarios,” which deviate from the normal and are usually unfamiliar to the models. In an effort to take the study outside the models’ comfort zones, the researchers modified existing tasks, using a variety of datasets and benchmarks tailored specifically to the models’ capabilities.

A prime example is base-10 arithmetic. While models perform well with this familiar number base, performance typically decreases substantially when working with other number bases. This suggests their arithmetic skills are not as generalized as they may seem. Other tasks such as chess problems with altered starting positions or musical chord fingering also saw drops in performance, implying difficulty in adapting to new situations without rote memorization.

The study’s lead author, Zhaofeng Wu, describes this observation as a fascinating aspect of LLMs. He emphasizes the importance of understanding this limitation as we aim to expand the models’ application horizons.

However, the study has limitations. The tasks and settings focused on do not fully represent the range of challenges the models could encounter in real-world applications, suggesting the need for more diverse testing environments in the future. Despite these limitations, the study provides vital insights into the workings and limitations of LLMs and has the potential to influence the design of future models for improved robustness.

Furthermore, the researchers aim to make LLMs more interpretable, to have a better understanding of their decision-making processes. This will help in discerning whether these models are genuinely generalizing to unseen tasks, or simply memorizing training data.

Assistant Professor Hao Peng at the University of Illinois at Urbana-Champaign praised the study for addressing the large question about the capabilities of state-of-the-art LLMs. He believes the research could inspire further study in identifying LLMs’ failure modes and developing better models.

The research team presented their work at the North American Chapter of the Association for Computational Linguistics (NAACL) last month. The study was supported in part by the MIT–IBM Watson AI Lab, the MIT Quest for Intelligence, and the National Science Foundation.

All
Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All
Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All
Categories

News(748)

Research(613)

School of Engineering(648)

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

The cognitive abilities of extensive linguistic models are frequently exaggerated.

Leave a comment Cancel reply

You May Also Like

Principles of Content Marketing in Ecommerce: The Role of Blogging in Boosting Sales and Conversions

Leading Undetectable Browsers of 2024

+60 12-462 2768

All Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All Categories

News(748)

Research(613)

School of Engineering(648)

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

The cognitive abilities of extensive linguistic models are frequently exaggerated.

Leave a comment Cancel reply

You May Also Like

Principles of Content Marketing in Ecommerce: The Role of Blogging in Boosting Sales and Conversions

Leading Undetectable Browsers of 2024

+60 12-462 2768

All
Categories

All
Categories

All
Categories