Skip to content Skip to footer

Policy Optimization via Dataset Reset (DR-PO): An AI method that leverages a generative model’s characteristic of resetting using historical data to improve RLHF using feedback derived from preferences.

Leave a comment

0.0/5