Skip to content Skip to footer

Improving the Precision and Brevity of Responses in Large Language Models using Restricted Stream-of-Consciousness Prompting.

With advancements in model architectures and training methods, Large Language Models (LLMs) such as OpenAI’s GPT-3 have showcased impressive capabilities in handling complex question-answering tasks. However, these complex responses can also lead to hallucinations, where the model generates plausible but incorrect information. This is also compounded by the fact that these LLMs generate responses word-by-word, meaning that longer, detailed responses can result in a significantly longer response time.

In addressing this challenge, researchers from the Department of Excellence in Robotics and AI at Scuola Superiore Sant’Anna and Mediavoice Srl have worked on prompt engineering, with techniques such as chain-of-thought (CoT) prompting helping to guide the model through intermediate reasoning steps. While improving the explanation and accuracy of responses, these also lead to longer outputs and therefore increased response times.

The team has proposed a new strategy termed ‘Constrained-Chain-of-Thought’ (CCoT), limiting output length to improve accuracy and response times without negatively impacting the quality of responses. Alongside the introduction of CCoT, they have also established metrics to evaluate conciseness and correctness, taking into account both output length and the accuracy of the responses generated.

Experiments with LLaMA2-70b on the GSM8K dataset demonstrated that constraining reasoning to 100 words significantly improved accuracy and reduced output length. Larger models were shown to benefit more from CCoT than smaller models, with the former seeing improved efficiency and concise accuracy.

The researchers stress the importance of brief and concise responses for LLMs and the balance between conciseness and correctness. Future research will look into how these metrics could be integrated into model fine-tuning and explore the influence of conciseness on phenomena like hallucinations or incorrect reasoning in LLMs, potentially leading to more sophisticated and efficient utilization of LLMs in a variety of applications.

Leave a comment

0.0/5