Skip to content Skip to footer

Researchers from Carnegie Mellon University Investigate Professional Advice and Tactical Variations in Multi-Agent Mimic Learning.

Carnegie Mellon University researchers are exploring the complexities of multi-agent imitation learning (MAIL), a mediation strategy in which a group of agents (like drivers on a road network) are coordinated through action recommendations, despite the mediator lacking knowledge of their utility functions. The challenge of this approach lies in specifying the quality of those recommendations, hence the need to provide outcomes data to the mediator.

MAIL strategies include single-agent imitation, which approximates learning but is prone to compounding errors; interactive methods such as inverse reinforcement learning, mitigating errors at the cost of efficiency; and inverse game theory, a less commonly explored strategic approach to multi-agent imitation learning in Markov Games.

Of these, researchers propose a modified methodology known as the regret gap, which accommodates potential deviations by the agents. It was found the regret gap was more demanding to manage than the value gap, the distance between optimal outcomes and those provided by the mediator, indicative of the superiority of the single-agent imitation learning approach.

In practice, the value gap is treated as a ‘weaker’ objective as it is more readily applicable in real-world contexts. Value gaps can be managed using algorithms derived from behavioral cloning (BC) and inverse reinforcement learning (IRL) single-agent imitation learning techniques. The researchers adapted these two strategies for multi-agent environments, developing two new methods named MALICE (Multi-agent Aggregation of Losses to Imitate Cached Experts) and BLADES. Both algorithms successfully minimize the regret gap.

MALICE extends the ALICE (Aggregation of Losses to Imitate Cached Experts) single-agent algorithm, which uses the policy of the expert to re-weight behavioral losses. MALICE then applies these principles to the MAIL environment.

The key to managing the regret gap lies in estimating the expert’s actions in counterfactual states, as the regret gap cannot be successfully controlled through environmental interaction alone. This approach offers a robust and efficient solution to improving outcomes in strategic agency settings.

Looking ahead, researchers are focused on evolving and practically implementing these algorithms. The research has shed light on the complexity and potential of multi-agent imitation learning as a strategy for managing multi-agent groups. The work continues to identify effective methods for addressing this strategic systems challenge.

Leave a comment

0.0/5