Skip to content Skip to footer

A new technique has been proposed by researchers from the Massachusetts Institute of Technology (MIT) and other institutions that allows large language models (LLMs) to solve tasks involving natural language, math and data analysis, and symbolic reasoning by generating programs. Known as natural language embedded programs (NLEPs), the approach enables a language model to create and execute a Python program to analyze and answer a user’s inquiries and then present the solution in natural language.

LLMs have demonstrated notable performance in tasks such as drafting legal briefs, analyzing customer review sentiment, and translating documents. However, due to their reliance on natural language to process information and answer queries, these models may struggle with tasks involving numerical or symbolic reasoning.

The team designed NLEPs to generate a step-by-step Python program, embedding the required natural language within the program. An NLEP is a problem-solving template comprising four stages: calling the necessary packages or functions, importing natural language representations of the required knowledge, implementing a function that calculates the solution, and producing the result in a line of natural language.

The introduction of NLEPs has resulted in LLMs achieving higher accuracy across a broad spectrum of reasoning tasks, with the method exhibiting over 90% accuracy when executing a range of symbolic reasoning tasks with GPT-4. The technique’s performance outstripped that of task-specific prompting methods by 30% and demonstrated improvements over open-source LLMs.

In addition to being efficient, NLEPs are also transparent. Users can inspect the program, correct any errors in the code directly, and avoid the need to rerun the entire model. If a user has similar queries, the model needs only to produce one core program, thus eliminating the need for constant reruns.

Contrary to many popular LLMs that function by predicting the next word or token based on natural language input, NLEPs use Python code and embed natural language within the program, thereby reducing potential errors.

NLEPs possess wider implications beyond accuracy and efficiency. They can enhance data privacy as the programs are run locally, thus eliminating the need for sensitive user data to be processed by an external entity like Google or OpenAI. Furthermore, they can enable small language models to perform better without the costly process of retraining the model for a specific task.

Despite the successes, there are limitations. NLEPs depend on the program generation capability of the model, which isn’t as effective in smaller models trained on limited datasets. There are plans to investigate ways of enabling smaller LLMs to generate more effective NLEPs. The researchers also aim to understand the effect of prompt variations on the NLEPs to increase the robustness of the model’s reasoning processes.

Leave a comment

0.0/5