A team of researchers from MIT, Harvard University, and the University of Washington have developed a novel reinforcement learning technique using crowdsourced feedback. The technique allows AI to learn complex tasks more quickly and without relying on an expertly designed reward function. The conventional reward function designed by dedicated human experts has been replaced by feedback gathered from multiple non-expert users, which instructs the AI throughout their learning process to accomplish its objective. Although other methods have also tried to employ non-expert feedback, this approach offers rapid learning despite the frequency of mistakes in user-generated data. Pulkit Agrawal, an assistant professor in the MIT Department of Electrical Engineering and Computer Science, believes the new method will scale up robotic learning and make it possible for non-experts to provide useful feedback. This technique may enable robots to perform specific tasks inside a user’s home without physical examples.
In this method, the reward function directs the exploration process rather than dictating the exact steps to achieving the objective. This way, even imprecise and noisy supervision still allows the AI to learn effectively. The machine learning method they call HuGE (Human Guided Exploration) uses two separate parts. One part draws on a goal selector algorithm that is regularly updated with crowdsourced human feedback to guide the AI’s exploration. The other allows the AI to explore on its own, supervised by the goal selector. Images or videos of the AI’s actions are used to update the goal selector.
When tested on simulated and real-world tasks, HuGE allowed AI agents to quickly achieve goals compared to other methods. Moreover, non-expert user-generated data yielded better performance than synthetic data created and labelled by the researchers. In subsequent research, the HuGE system was improved so that, once the AI agents have learned a task, they can autonomously reset the environment to continue learning. The researchers aim to refine HuGE further so it can learn from other forms of communication, such as natural language and physical interactions with the robot. They are also keen on using this method to teach multiple robots simultaneously. The research was partially funded by the MIT-IBM Watson AI Lab.