Skip to content Skip to footer

An AI research paper from UC Berkeley outlines that coupling GPT with Prolog, a dependable symbolic system, significantly enhances its capacity to solve mathematical problems.

Researchers from the University of California, Berkeley, have recently shed light on developing the performance of large language models (LLMs) in the field of Natural Language Processing (NLP). In spite of showing a high degree of language comprehension, LLMs display limitations in reliable and flexible reasoning. This can be attributed to the structural operation of transformers, the underlying technology that powers LLMs. Transformers solve problems step-by-step in a next-word prediction format, which hinders their capacity to rectify mistakes or adapt to problems outside their training distribution.

Recognition of these shortcomings has prompted exploration into enhancing the functionality of LLMs via external tools and symbolic reasoning modules. Tools such as calculators, interpreters, and external datasets have been employed to boost reasoning ability. While this approach has been successful in reducing arithmetic errors, it does not fully counter the deficiencies arising from LLM’s next-word prediction approach and the linear nature of text.

The researchers have proposed the concept of incorporating a reliable deductive reasoning module into the inference pipeline. This method entails prompting the model to encode the constraints and relationships as a set of Prolog code statements, which are based on the variables explained in the problem statement. The Prolog then evaluates the generated code to provide an accurate answer to the problem using a deductive technique. This method is significant as it replicates the human structure of separate linguistic and reasoning systems, and notably enhances LLMs’ mathematical reasoning capability.

In addition to this, there’s also an introduction of a new dataset, the Non-Linear Reasoning (NLR) dataset, designed to evaluate an LLM’s mathematical reasoning competence. The NLR dataset contends with issues present in existing datasets, such as the overlap between test and training sets and the repetitive reasoning patterns. The NLR dataset comprises unique constraint problems, mathematical word problems, and problems linked to algorithmic instructions for updating a game model. As part of the evaluation process, GPT-3.5 Turbo and GPT-4 were subjected to multiple experiments with varying numbers of entangled variables. It was found that as the number of entangled variables increased, the model’s ability to solve the problems significantly dropped.

In summary, the research proposes merging a reliable, deductive reasoning module into the inference pipeline. Displaying the inherent limitations of LLMs in performing reliable and comprehensive reasoning. A neurosymbolic approach prompts the LLM to translate the information encoded by problem statements into logical code statements. This division of labor notably augments the LLMs’ performance on mathematical reasoning tasks. The NLR dataset is posited as a powerful benchmark for gauging LLMs’ ability to handle non-linear reasoning problems and the standard next-word prediction approach of LLMs. This research could lead to substantial advancements in the performance and applicability of LLMs in the NLP field.

Leave a comment

0.0/5