Advancements in Large Language Models (LLMs) have notably benefitted the development of artificial intelligence, particularly in creating agent-based systems. These systems are designed to interact with various environments and carry out actions to meet specific goals. One of the significant challenges includes the creation of elaborate planning environments and tasks, most of which currently rely heavily on manual labor. This method reduces the diversity and quantity of training data available, thereby limiting the LLM’s ability to generalize and perform across different situations.
To tackle this issue, researchers from the University of Hong Kong and Microsoft Corporation have introduced AGENTGEN, a groundbreaking framework that harnesses the power of LLMs to automate the generation of environments and related planning tasks. AGENTGEN operates in two primary stages: environment generation and task generation. The first stage uses an inspiration corpus made up of diverse text segments to create varied environment specifications. Following this, the framework generates planning tasks, ranging from simple to complex.
The sophistication of AGENTGEN lies in its ability to generate diverse environments. To do this, the researchers developed an inspiration corpus, acting as context for generating environment specifications. For instance, a sample text segment could guide the generation of a scenario where the agent has to create a new recipe book focused on peanut butter powder.
The task generation process in AGENTGEN utilizes a method called BI-EVOL, which evolves tasks by simplifying or increasing their complexity, resulting in a complete set of planning tasks and supporting a gradual learning curve for the LLMs. With BI-EVOL, the research team generated 592 unique environments, each with 20 tasks, and 7,246 high-quality trajectories for training.
AGENTGEN was tested using the AgentBoard platform and showed considerable improvements in the planning abilities of LLM-based agents. The AGENTGEN-tuned Llama-3 8B model outperformed GPT-3.5 and in certain tasks, it even outdid GPT-4. Notably, AGENTGEN substantially increased success rates in in-domain and out-of-domain tasks.
AGENTGEN’s ability to generalize across different models and tasks was also observed. For example, Llama-3 8B showed a success rate increase of 10.0 and a progress rate increase of 9.95 after training with AGENTGEN. These findings demonstrate that AGENTGEN can effectively improve the planning performance of various LLMs, regardless of the model used.
In conclusion, the AGENTGEN framework has the potential to revolutionize the training and application of LLM-based agents. By automating the generation of diverse environments and planning tasks, it can overcome the limitations of manual design, thus enhancing the development of intelligent systems that can perform complex planning tasks with greater accuracy and efficiency.