Recursive IntroSpection (RISE): A Method of Machine Learning for Optimizing LLMs to Enhance Their Sequential Responses Across Numerous Turns

Large language models (LLMs) act as powerful tools for numerous tasks but their utilization as general-purpose decision-making agents poses unique challenges. In order to function effectively as agents, LLMs not only need to generate plausible text completions but they also need to show interaction and goal-directed behaviour to complete specific tasks. Two critical abilities required for this are the active search for task-relevant information and the making of decision which can be improved via “thinking” and verification at the time of inference. However, current methodologies often struggle to achieve these capabilities, particularly in complicated tasks needing logical reasoning.

Approaches to improve the reasoning and thinking capabilities of foundation models have been tried out for downstream applications. These primarily focus on developing prompting techniques to enable effective multi-turn interaction with external tools, refining predictions sequentially using reflection, verbalising thought, self-criticism and revision, and using other models to criticise responses. Despite promising results, these techniques often rely on detailed error traces or external feedback to succeed.

The effectiveness and limitations of prompting techniques and fine-tuning LLMs with self-improvement abilities have been highlighted in the studies. Strategies like training on responses self-generated, learned verifiers, search algorithms, contrastive prompting on negative data, and supervised or reinforcement learning iterated have been explored.

Researchers from Carnegie Mellon University, UC Berkeley, and MultiOn have proposed RISE (Recursive IntroSpEction), a new approach to improve self-improvement capabilities of LLMs. RISE uses an iterative fine-tuning procedure that frames single-turn prompts as multi-turn Markov decision processes. RISE develops strategies for multi-turn data collection and training via principles from online imitation learning and reinforcement learning. RISE enables LLMs to recursively detect and rectify mistakes in subsequent iterations.

RISE works by converting single-turn problems into a multi-turn Markov Decision Process. This process transforms prompts into initial states, using model responses as actions. The next state is created by the concatenation of the current state, the model’s action, and a fixed introspection prompt. RISE uses either distillation from a highly capable model or self-distillation to generate more improved responses. RISE uses supervised learning to train the model that enables it to enhance its predictions across sequential attempts.

The performance of RISE has seen substantial improvements across multiple benchmarks. The method’s effectiveness was observed across different base models, with Mistral-7B + RISE outperforming Eurus-7B-SFT, a model specifically fine-tuned for math reasoning. By converting single-turn problems into multi-turn Markov Decision Processes, RISE uses iterative reinforcement learning along with on-policy rollout data utilizing either expert or self-generated supervision, significantly enhancing the self-improvement capabilities of 7B models.

RISE presents a promising direction to advance the self-improvement capabilities of LLMs. Currently, there may be limitations from computational constraints, particularly with self-generated supervision however, the technique is highly promising and is expected to open new doors for performance improvement in LLM models.

All
Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All
Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All
Categories

News(748)

Research(613)

School of Engineering(648)

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Recursive IntroSpection (RISE): A Method of Machine Learning for Optimizing LLMs to Enhance Their Sequential Responses Across Numerous Turns

Leave a comment Cancel reply

You May Also Like

A quicker, more efficient method to safeguard against an AI chatbot providing harmful or inappropriate responses.

Scientists from GSK AI and Imperial College have launched RAmBLA, a machine learning tool created to assess the dependability of LLMs as auxiliary aids in the biomedical field.

+60 12-462 2768

All Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All Categories

News(748)

Research(613)

School of Engineering(648)

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Recursive IntroSpection (RISE): A Method of Machine Learning for Optimizing LLMs to Enhance Their Sequential Responses Across Numerous Turns

Leave a comment Cancel reply

You May Also Like

A quicker, more efficient method to safeguard against an AI chatbot providing harmful or inappropriate responses.

Scientists from GSK AI and Imperial College have launched RAmBLA, a machine learning tool created to assess the dependability of LLMs as auxiliary aids in the biomedical field.

+60 12-462 2768

All
Categories

All
Categories

All
Categories