In the wake of the introduction of ChatGPT, AI applications have increasingly adopted the Retrieval Augmented Generation (RAG), with a primary focus on improving these RAG systems to influence the future generation of AI applications. The ideal AI agents are designed to enhance the capabilities of the Language Model (LM) to solve real-world problems, especially those that necessitate the ability to reason, plan, and execute tasks effectively.
For an AI agent to successfully interact with complex environments, it must be able to reason independently and assist users in accomplishing a variety of tasks. A harmonious relationship between action and reasoning helps AI agents learn new tasks swiftly. Moreover, AI agents must also possess the ability to adapt their plans based on new information or feedback, and the absence of this could potentially result in improper operation.
To address these challenges, researchers from IBM and Microsoft have proposed two distinct AI agent architectures—Single Agent Architectures (SSAs) and Multi-Agent Architectures (MAAs) to fulfill complex goals—each of which features enhanced reasoning, planning, and tool execution capabilities. SSAs, as the name suggests, depend on a single language model and conduct all tasks independently. Conversely, MAAs comprise of two or more agents, each utilizing either the same or different language models.
SSAs, while devoid of a feedback mechanism from other AI agents, can accept and incorporate feedback from users to guide them in reaching their goals. On the other hand, MAAs have unique identities or characteristics and can employ the same or different tools. These agents rely on multiple organizations, divided into vertical and horizontal fractions, with most architecture lying between these two ends of the spectrum.
Researchers introduced methods such as Language Agent Tree Search (LATS) and MetaGPT while working on SSAs and MAAs. LATS is a single-agent technique that integrates planning, action, and reasoning through trees and uses an LM-based strategy to explore potential outcomes before selecting an action. MetaGPT addresses the challenge of unproductive chatter among MAAs by requiring agents to produce structured outputs like documents and diagrams. In tests against the HumanEval and MBPP benchmarks, MetaGPT’s MMA demonstrated superior results compared to SSAs.
In conclusion, the proposed AI agent architectures—SSAs and MAAs—are designed to overcome complex goals and exhibit strong performance in various tasks that require reasoning and tool execution. However, these architectures present challenges concerning agent evaluation, as the introduction of unique agent benchmarks alongside their agent implementation makes the comparison of multiple agent implementations challenging.