Function-calling agent models are a critical advancement in large language models (LLMs). They interpret natural language instructions to execute API calls, facilitating real-time interactions with digital services, like retrieving market data or managing social media interactions. However, these models often face challenges as they require high-quality, diverse and verifiable datasets. Unfortunately, many existing datasets lack thorough verification and diversity which leads to inaccuracies and inefficiencies – severely limiting the agents’ adaptability and performance.
Salesforce’s AI Research team has addressed these issues by developing APIGen – an automated pipeline designed to generate diverse and verifiable function-calling datasets. APIGen uses a three-stage verification process to ensure data reliability and correctness. Starting with API sampling and example query-answer pairs, it formats them into a standardized JSON format. Next is the verification process where stage 1 involves a format check that ensures the correct JSON structure, stage 2 sees the function calls executed to verify their operational correctness, and stage 3 uses a semantic check to ensure alignment between the function calls, results, and query objectives. The output is a comprehensive dataset of 60,000 high-quality entries, covering 3,673 APIs across 21 categories, which are available on Huggingface.
Using this thorough process, APIGen’s datasets significantly improved model performance, achieving state-of-the-art results on the Berkeley Function-Calling Benchmark. Models trained with these datasets even outperformed multiple GPT-4 models. For example, a smaller model with only 7B parameters achieved an accuracy of 87.5%, surpassing previous top models by a substantial margin. Thus, it shows the potential of APIGen-generated datasets in significantly enhancing the capabilities of function-calling agents.
In conclusion, APIGen has succeeded in creating a novel framework for generating high-quality and diverse function-calling datasets – addressing a pressing issue in AI research. It incorporates a unique multi-stage verification process, ensuring the reliability and correctness of data, and thereby, significantly improving model performance. Moreover, it enables models of various sizes to achieve competitive results. APIGen is, therefore, a game-changer in the field of function-calling agents, demonstrating the importance of high-quality data to AI research and paving the way to developing more efficient and powerful LLMs.
The researchers’ study is publicly available, and future updates on the project can be obtained by following their Twitter and LinkedIn social media accounts or subscribing to their newsletter or SubReddit.