Skip to content Skip to footer

HyPO: A Combined Reinforcement Learning Algorithm Utilizing Offline Data for Comparison-based Preference Optimization and Unlabeled Online Data for KL Regularization

Leave a comment

0.0/5