Large Language Models (LLMs) are pivotal for advancing machines' interactions with human language, performing tasks such as translation, summarization, and question-answering. However, evaluating their performance can be daunting due to the need for substantial computational resources.
A major issue encountered while evaluating LLMs is the significant cost of using large benchmark datasets. Conventional benchmarks like HELM…
