Teaching AI agents new tasks can be a challenging and time-consuming process, often involving iteratively updating a reward function designed by a human expert to motivate the AI’s exploration of possible actions. However, researchers from the Massachusetts Institute of Technology, Harvard University, and the University of Washington have developed a new reinforcement learning approach that bypasses this laborious process by leveraging crowdsourced feedback from non-expert users.
Instead of relying on a carefully crafted reward function, the researchers’ method feeds the AI agent with crowdsourced direction to guide it towards achieving the set goal. This approach allows for faster learning as the AI can continue to explore based on the feedback, even when the advice from human contributors contains errors. The method also works asynchronously, widening its reach as contributors from all parts of the world can participate in guiding the AI learning process.
The developers of the method leveraged two separate but interconnecting algorithms and termed the approach HuGE, which stands for Human Guided Exploration. One of these algorithms, the “goal selector,” utilizes crowdsourced human feedback not as a reward function but as direction for the AI to try more targeted exploratory steps in reaching the desired goal.
In application, the reinforcement learning method has proven successful in both real-world and simulated tests. The AI agent demonstrated effective learning of tasks that required long sequences of actions such as navigating large mazes and meticulously arranging blocks. Real-world application of the HuGE method included training robotic arms for tasks like drawing the letter “U” and picking and placing objects. These experiments, incorporating feedback from non-expert users in 13 countries on three continents, indicated that HuGE helped the AI learn to meet tasks faster compared to other methods.
The researchers intend to refine HuGE for even more seamless learning by enabling the AI to learn from different forms of communication such as natural language and physical interactions with robots. Simultaneous learning by multiple AI agents is also an area of interest. The ultimate goal is to create AI agents that can perform specific tasks in a user’s home without requiring the user to demonstrate each task physically. The AI agent will thus be able to explore autonomously, guided by crowdsourced feedback. This project received funding from the MIT-IBM Watson AI Lab.