Artificial Intelligence (AI) has revolutionized numerous industries, from customer service to content generation, by deploying large language models (LLMs) that can supply accurate and useful replies to human prompts. However, these models tend to favor longer responses, exhibiting an inherent length bias that complicates model evaluation.
To balance response length with quality, researchers have developed Length-Instruction Fine-Tuning (LIFT) – an innovative approach that provides models direct length instructions. This Meta FAIR and New York University joint research project aims to enhance the AI’s adherence to the specified criteria. Models are fine-tuned using Direct Preference Optimization (DPO), with the training data modified to include length instructions.
To put LIFT to the test, researchers used it on models such as Llama 2 and Llama 3, which were conditioned to process prompts with and without length directives. The team created augmented datasets that comprised preference pairs reflecting both length constrains and response quality. The GPT-4 Turbo model, for instance, didn’t perform satisfactorily, breaking length restrictions nearly half the time. However, LIFT-DPO models showcased significantly lower violation rates. Remarkably, the Llama-2-70B-Base model, when trained using standard DPO, violated length constraints 65.8% of the time. In contrast, following LIFT-DPO training, the percentage drastically fell to 7.1%.
Additionally, LIFT-DPO models maintained a high level of response quality, implying they could generate exceptional output within fixed length limits. For instance, the Llama-2-70B-Base model recorded a win rate surge from 4.6% with conventional DPO training to 13.6% with LIFT-DPO, proving LIFT’s capacity to balance length control and response quality.
To conclude, the collaborative effort between Meta FAIR and New York University offers a potential solution to overcome length bias in instruction-following models with their pioneering LIFT method. LIFT introduces length constraints to improve controllability and quality, making it a more reliable, efficient method for producing precise, high-quality responses in AI models. This method serves as a promising advancement in AI research, setting a new standard for instruction-following abilities.