Skip to content Skip to footer

Stanford and MIT researchers have unveiled the Stream of Search (SoS): A Machine Learning structure, designed to allow language models to learn how to resolve issues by conducting searches in language without relying on any external assistance.

To improve the planning and problem-solving capabilities of language models, researchers from Stanford University, MIT, and Harvey Mudd have introduced a method called Stream of Search (SoS). This method trains language models on search sequences represented as serialized strings. It essentially presents these models with a set of problems and solutions in the language they understand, teaching them to search and backtrack in problem-solving.

SoS aims to enhance language models’ capacity for complex decision-making, planning, and reasoning, by enabling them to learn strategies for avoiding issues like error compounding and challenges inherent in lookahead tasks. Too often, language models are mainly used for inference, with their ability to reason and anticipate beyond the next step often hindered by a lack of exposure to “fruitful mistakes.” The SoS method helps to pinpoint and address this gap.

This approach was showcased using the game of Countdown, creating a unified language to describe search processes. Training a transformer-based language model on streams of searches improved accuracy by 25%. Further refinement using policy improvement techniques solved an additional 36% of previously unsolved problems. The results affirm SoS’s potential in enabling language models to learn independently, improving their problem-solving capabilities and autonomously discovering new strategies.

Accordingly, models trained on suboptimal search trajectories outperformed those trained on optimal solutions, highlighting the value of exposure to error-filled processes for developing adaptive and robust problem-solving strategies. The researchers also explored self-improvement strategies using reinforcement learning, such as expert iteration and Advantage-Induced Policy Alignment (APA), which contributed to improved efficiency and accuracy.

The SoS framework represents an advancement in teaching language models to learn and problem-solve through search processes. It helps models backtrack, explore alternatives, and reduce errors. It also enables models to learn internal “world models” for search, potentially improving generalization. As the framework was tested on the Countdown game, more work is needed to apply and verify its effectiveness on more complex real-world tasks. Future research could improve SoS by incorporating formalizable operations and exploring its transferability across different domains.

Perhaps most optimistically, SoS method showcases the potential for language models to excel in problem-solving through a combination of diverse search strategies and iterative refinement. With its approach, it not only bolsters language models in the hypothetical Countdown game problem, but it could lay the groundwork for a more broader, real-world application of language models. With more development and expansion, these models could be provided with the problem-solving and reasoning ability necessary to tackle complex tasks and scenarios. The researchers plan to continue refining this process, possibly unlocking new capabilities of language models.

Leave a comment

0.0/5