Researchers have introduced an innovative algorithm known as CIPHER that optimizes large language models (LLMs) by interpreting user feedback edits. LLMs are becoming increasingly popular for a range of applications, with developers constantly enhancing the capabilities of these models. However, one of the key challenges is the alignment and personalization of these models to specific user preferences and tasks.
Conventional approaches for refinement of these models involve fine-tuning feedback methods such as comparison-based preference feedback in RLHF, which are, however, expensive to implement due to the need for human annotators. To overcome this, language agents have been developed for interactive learning, relying on user edits to produce personalized, context-oriented responses. In a novel approach to this challenge, researchers have introduced PRELUDE, a framework that uses the user’s direct edits to learn preferences. Due to the complexity and contextual variability of user preferences, this can be a tricky problem to solve.
To address this issue, a team from Cornell University’s Department of Computer Science and Microsoft Research New York has focused on the development of CIPHER. This powerful new algorithm capitalizes on the functionality of LLMs to infer user preferences from user-caused edits in specific contexts. CIPHER uses related contexts from the past to extract these inferred preferences and combine them into responses. This unique approach has allowed CIPHER to exemplarily outperform other algorithms, yielding the smallest edit distance cost.
The research team has used the GPT-4 LLM as the base for CIPHER and all comparative baselines in their study, without fine-tuning or the addition of any extra parameters. The performance of CIPHER was assessed against baselines that did not learn, learned context-independent preferences or used past user edits without learning preferences. The algorithm outperformed all of these, reducing the edit distance cost by 31% in the summarization task and by 73% in the email writing task. This was achieved through the retrieval and combination of five preferences (k=5), indicating strong potential for learning accurate preferences.
Overall, the CIPHER and PRELUDE models present an exciting new frontier in personalized learning for AI, particularly LLMs. The capacity of these models to infer and learn from user edits can greatly enhance the alignment and personalization characteristics of LLMs. Importantly, CIPHER is not only cost-effective but also results in improved performance regarding cost reduction relative to other baseline methods. As with any scientific research, credit for these advancements is due to the dedicated project researchers.