In the field of information retrieval (IR), large language models (LLMs) often require human-created prompts for precise relevance ranking. This demands a considerable amount of human effort, increasing the time consumption and subjectivity of the process. Current methods, such as manual prompt engineering, are effective but still time-intensive and plagued by inconsistent skill levels. Current automatic methods focus on elementary tasks, such as language modeling, ignoring the unique challenges of relevance ranking.
Researchers from Rutgers University and the University of Connecticut have presented a solution: APEER (Automatic Prompt Engineering Enhances LLM Reranking). APEER is an automated prompt engineering platform that uses iterative feedback and preference optimization to reduce human input. It produces refined prompts based on performance feedback and comparison with preferred prompt examples, thus improving the efficiency and accuracy of LLMs in IR tasks.
APEER begins by generating prompts and then fine-tunes them through two major steps: feedback optimization and preference optimization. Feedback optimization acquires performance feedback on the current prompt and creates an enhanced version. Preference optimization uses sets of positive and negative examples to further refine the prompt.
The training and validation of APEER are conducted using multiple datasets, including MS MARCO, TREC-DL, and BEIR, thus ensuring the method’s versatility and efficacy across various IR tasks and LLM architectures.
APEER significantly improves LLM performance in relevance ranking tasks, showing substantial growth in metrics such as nDCG@1, nDCG@5, and nDCG@10 over traditional manual prompts. For example, APEER’s prompts have improved performance by an average of 5.29 nDCG@10 over eight BEIR datasets when compared with manual prompts on the LLaMA3 model.
APEER’s prompts also display greater transferability across diverse tasks and LLM architectures, consistently outperforming baseline methods across datasets and models such as GPT-4, LLaMA3, and Qwen2.
In conclusion, APEER offers an automated solution to prompt engineering for LLMs in IR that reduces human input and increases LLM performance. By utilizing iterative feedback and preference optimization, APEER represents a significant step forward in the field, offering a scalable and effective solution for optimizing LLM prompts in complicated relevance scenarios.