Researchers from MIT, Harvard, and the University of Washington have developed a new method for training AI agents using reinforcement learning. Their approach replaces a process often involving a time-consuming design of a reward function by a human expert with feedback crowdsourced from non-expert users.
Traditionally, AI reinforcement learning has used a reward function, designed by experts, that guides the AI agent towards its objective. However, this process can be inefficient and hard to scale for complex, multi-step tasks. The new approach instead uses feedback gathered from multiple non-expert users to guide the AI agent’s learning.
While previous methods have also used non-expert feedback, this new approach allows the agent to learn more quickly. Even if the data collected from users is “noisy”—punctuated with errors—the AI agent can still learn effectively. This method also allows for asynchronous training, enabling users worldwide to contribute to the AI’s learning process at any time.
“We propose a way to scale robot learning by crowdsourcing the design of the reward function and making it possible for nonexperts to provide useful feedback,” said Pulkit Agrawal, an assistant professor in MIT’s Department of Electrical Engineering and Computer Science who leads the Improbable AI Lab.
This new approach has been named HuGE, or Human Guided Exploration. It consists of two parts: a goal selector algorithm that uses the crowdsourced feedback to guide the AI’s exploration, and the AI agent exploring on its own in a self-supervised learning process guided by the goal selector.
The researchers demonstrated HuGE’s effectiveness on a range of simulated and real-world tasks. It appeared to allow for faster learning compared to previous reinforcement learning methods and proved better than synthetic data generated by researchers.
In the future, this research could be applied to autonomous robots in user homes, where the resident would not have to physically show the robot how to perform tasks. Lastly, the researchers hope to expand HuGE’s capabilities by enabling it to learn from other forms of communication such as written instructions and physical interactions.