Large Language Models (LLMs) like GPT-4, PaLM, and LLaMA have shown impressive performance in reasoning tasks through various effective prompting methods and increased model size. The performance enhancement techniques are generally categorized into two types: single-query systems and multi-query systems. However, both these systems come with limitations, the most notable being inefficiencies in the designing process and computational intensity.
Single-query methods often rely on prior assumptions or relevant exemplars, making it impractical to design reasoning systems task by task. Multi-query systems, on the other hand, are computationally demanding as they continually expand reasoning paths to find a unique structure for every task. Both types of systems can also fail to extract general, high-level guidelines from completed tasks to improve accuracy and efficiency.
To overcome these challenges, researchers from Peking University, UC Berkeley, and Stanford University have developed the Buffer of Thoughts (BoT), a new approach designed to enhance the reasoning accuracy, efficiency, and robustness of LLMs across a wide range of tasks. BoT features a meta-buffer that stores generalizable, high-level ideas or thought-templates derived from various problem-solving processes, which can be reused to facilitate effective reasoning.
The BoT approach also incorporates a buffer manager to dynamically update the meta-buffer, thereby increasing its capacity as more tasks are completed. This method brings several benefits including enhanced precision by reusing shared thought-templates, streamlined reasoning processes by directly utilizing informative historical reasoning structures, and increased model robustness by reflecting human brain processes in retrieving and instantiating thoughts. Experimental results across varied tasks highlight the advantages of this approach, with BoT significantly improving accuracy, efficiency, and resilience.
However, the BoT approach has its own limitations especially when dealing with problems that require human-like ingenuity, given that such problems usually lack a precise thought-template. Furthermore, the quality of thought-templates can be compromised if a less robust model is used to initialize the meta-buffer. To address these gaps, future work could include creating an open-domain system by integrating BoT with external resources or optimizing the distillation of thought-templates to improve their functionality for complex tasks.
In essence, the BoT approach is seen as a promising tool for enhancing the performance of LLMs in real-world applications. However, it is vital to focus on the mentioned challenges to further advance the capabilities of this method.