Skip to content Skip to footer

Google AI presents PERL – A Reinforcement Learning Methodology that utilizes fewer parameters. This technique is capable of training a reward model and fine-tuning a language model policy using LoRA.

Google’s team of researchers has introduced a new methodology called Parameter-Efficient Reinforcement Learning (PERL) that enhances the efficiency and applicability of Reinforcement Learning from Human Feedback (RLHF) models with Large Language Models (LLMs). The current RLHF process is computationally intense and requires vast resources, thus restricting its broad usage. PERL provides a solution to this by using a technique called LoRA (Layer-wise Relevance Approximation) which significantly reduces computational and memory requirements by selective training.

RLHF improves the alignment of LLMs with human values making them more reliable. However, RLHF is resource-intensive and computationally heavy. Different techniques have been developed to overcome this, such as RLHF, RLAIF, and LoRA. RLHF fits a reward model on preferred outputs and uses reinforcement learning algorithms like PPO for training a policy. Human feedback in training the reward models can be expensive, hence the development of Parameter Efficient Fine-Tuning (PEFT) methods like LoRA. LoRA enables the training of a small fraction of the total parameters by factorizing weight updates into trainable low-rank matrices.

PERL utilizes LoRA for efficient parameter usage in RLHF model training over various datasets such as text summarization, harmless response preference modeling, and UI Automation tasks. PERL has demonstrated 50% reduced memory usage and a 90% acceleration in Reward Model training. These LoRA-enhanced models match the accuracy of fully trained models using only half the peak HBM usage and provide 40% faster training.

Through PERL, ensemble models can be employed for robust, cross-domain generalization and risk reduction in reward hacking at lower computational costs. Google’s development of PERL brings considerable progress in aligning AI with human values and preferences. It addresses the computational challenges of RLHF, increasing the efficiency and applicability of LLMs, thus setting a new standard for future AI alignment research.

PERL demonstrates the transformative potential of parameter-efficient methods in the field of artificial intelligence, with enhancements towards greater accessibility, efficiency, and alignment with human values.

Leave a comment

0.0/5