Reinforcement learning, which involves teaching an AI agent a new task using a trial and error methodology, often requires the assistance of a human expert to create and modify the reward function. However, this can be time-consuming, inefficient and difficult to upscale, particularly when the task is highly complex and involves several stages. In response to these issues, researchers at MIT, the University of Washington and Harvard University have been developing a new type of reinforcement learning that harnesses feedback from nonexpert users to guide the AI agent in the right direction.
Unlike other reinforcement learning techniques, which can be hampered by the level of errors often found in data crowdsourced by nonexperts, this new approach allows the agent to learn more swiftly and efficiently. Furthermore, feedback can be gathered asynchronously, meaning nonexpert users from around the globe can participate in training the agent.
This method holds potential for allowing robots to quickly learn how to conduct specific tasks in a user’s home environment, without requiring the homeowner to physically show the robot an example of each individual task. The robot could instead discover how to complete tasks autonomously, with nonexpert, crowdsourced feedback lending guidance along the way.
The process is divided into two distinct sections, each led by its own algorithm. The Human Guided Exploration (HuGE) method relies on a goal selector algorithm, which is consistently updated with crowdsourced human feedback. Rather than being used as a reward function, this feedback serves to shape the agent’s exploration process. In this way, nonexpert users pave a breadcrumb trail that incrementally guides the AI agent towards its objective. Simultaneously, the agent explores autonomously in a self-supervised style steered by the goal selector, collecting images and video clips of actions it attempts, which are subsequently used to refine the goal selector and help to refine the areas the agent explores.
This method was tested both in simulated and real-world situations, in which it quickly and effectively completed tasks by employing a combination of long sequences of actions. In real-world experiments, the technique was used to teach robotic arms how to draw the letter ‘U’ and pick up and move objects, using data crowdsourced from over 100 nonexpert users situated in 13 countries across three continents. In both experiments, HuGE was able to assist agents in completing the tasks swiftly than other methods.
In future, the researchers are eager to expand HuGE so the agent can learn from various other forms of communication, like physical interaction with the robot and natural language. They are also interested in employing this methodology to instruct numerous agents at the same time.