A team of researchers from Meta FAIR have been studying Large Language Models (LLMs) and found that these can produce more nuanced responses by distilling System 2 reasoning methods into System 1 responses. While System 1 operates quickly and directly, generating responses without intermediate steps, System 2 uses intermediate strategies, such as token generation and repeated probing, to produce more detailed and thoughtful responses.
System 2 methods often produce more accurate results due to their explicit reasoning. However, as they require more computing power and higher latency, they are generally not suitable for production systems, which tend to use the quicker System 1 generation. The research is focused on ‘optimizing’ this process by distilling the advantages of System 2 into System 1.
A variety of System 2 methods have been developed to improve the final answers of LLMs, such as Rephrase and Respond, System 2 Attention, and Branch-Solve-Merge. The idea is to make use of these strategies at intermediary stages of reasoning to enhance the final responses in terms of quality and accuracy.
The team discovered that by using self-supervised methods to distill the high-quality outputs from System 2 back into System 1 generations, they were able to retain the same level of reasoning found in System 2. This eliminated the need to generate intermediate reasoning token sequences during inference, resulting in a more efficient process with less computing costs.
Findings suggested that many System 2 methods can be effectively reduced to System 1, thus reducing computational costs while maintaining the quality of responses. It was noted that the process of distillation is crucial for the development of future AI systems, where System 2 resources can be focused on complex reasoning tasks, while streamlined System 1 responses handle simpler tasks.
By distilling System 2 reasoning methods into System 1, the researchers have made a significant step forward in AI capabilities. This distillation process improves the quality and accuracy of the model’s output, while cutting down on the computational costs associated with System 2 methods. As such, this method provides a feasible solution for real-world applications that will allow for the optimization of available resources.