Skip to content Skip to footer

Recursive IntroSpection (RISE): A Method of Machine Learning for Optimizing LLMs to Enhance Their Sequential Responses Across Numerous Turns

Large language models (LLMs) act as powerful tools for numerous tasks but their utilization as general-purpose decision-making agents poses unique challenges. In order to function effectively as agents, LLMs not only need to generate plausible text completions but they also need to show interaction and goal-directed behaviour to complete specific tasks. Two critical abilities required for this are the active search for task-relevant information and the making of decision which can be improved via “thinking” and verification at the time of inference. However, current methodologies often struggle to achieve these capabilities, particularly in complicated tasks needing logical reasoning.

Approaches to improve the reasoning and thinking capabilities of foundation models have been tried out for downstream applications. These primarily focus on developing prompting techniques to enable effective multi-turn interaction with external tools, refining predictions sequentially using reflection, verbalising thought, self-criticism and revision, and using other models to criticise responses. Despite promising results, these techniques often rely on detailed error traces or external feedback to succeed.

The effectiveness and limitations of prompting techniques and fine-tuning LLMs with self-improvement abilities have been highlighted in the studies. Strategies like training on responses self-generated, learned verifiers, search algorithms, contrastive prompting on negative data, and supervised or reinforcement learning iterated have been explored.

Researchers from Carnegie Mellon University, UC Berkeley, and MultiOn have proposed RISE (Recursive IntroSpEction), a new approach to improve self-improvement capabilities of LLMs. RISE uses an iterative fine-tuning procedure that frames single-turn prompts as multi-turn Markov decision processes. RISE develops strategies for multi-turn data collection and training via principles from online imitation learning and reinforcement learning. RISE enables LLMs to recursively detect and rectify mistakes in subsequent iterations.

RISE works by converting single-turn problems into a multi-turn Markov Decision Process. This process transforms prompts into initial states, using model responses as actions. The next state is created by the concatenation of the current state, the model’s action, and a fixed introspection prompt. RISE uses either distillation from a highly capable model or self-distillation to generate more improved responses. RISE uses supervised learning to train the model that enables it to enhance its predictions across sequential attempts.

The performance of RISE has seen substantial improvements across multiple benchmarks. The method’s effectiveness was observed across different base models, with Mistral-7B + RISE outperforming Eurus-7B-SFT, a model specifically fine-tuned for math reasoning. By converting single-turn problems into multi-turn Markov Decision Processes, RISE uses iterative reinforcement learning along with on-policy rollout data utilizing either expert or self-generated supervision, significantly enhancing the self-improvement capabilities of 7B models.

RISE presents a promising direction to advance the self-improvement capabilities of LLMs. Currently, there may be limitations from computational constraints, particularly with self-generated supervision however, the technique is highly promising and is expected to open new doors for performance improvement in LLM models.

Leave a comment

0.0/5