As robots are increasingly being deployed for complex household tasks, engineers at MIT are trying to equip them with common-sense knowledge allowing them to swiftly adapt when faced with disruptions. A newly developed method by the researchers merges robot motion data and common-sense knowledge from extensive language models (LLMs).
The new approach allows a robot to divide a complicated household task into smaller, manageable subtasks. If the robot faces a disruption in the course of a subtask, it interacts with the LLMs, adjusts to the changes, and continues tasks without having to start from scratch.
Yanwei Wang, a Ph.D. student in MIT’s Department of Electrical Engineering and Computer Science (EECS), highlights the significance of their new approach. He emphasized that despite the efficacy of imitation learning, small errors can accumulate over time, resulting in serious execution issues.
The study on the new approach aims to demonstrate how a robot can adapt to tiny errors, correct them in real-time, and continue with the task – all without human interference. The team of researchers will present their study at the International Conference on Learning Representations (ICLR) in May.
The researchers illustrated their method using a straightforward task: transferring marbles from one bowl to another. The engineers guiding the robot realized that executing the task involved a series of interconnected subtasks.
The team found that LLMs could automatically provide a list of required subtasks for the task. By integrating the robot’s task execution with this LLM-generated list, they enabled the robot to independently identify its progress in the task sequence and recover from disruptions.
The researchers created an algorithm that connected the LLM’s verbal label for each subtask to the robot’s exact position or image signifying the robot state. This process is known as “grounding”. Thus, the robot can identify its position in a task sequence in real-time and adjust accordingly, highlighting its ability to relate workflow in physical space with the grammatical knowledge of LLMs after each disruption.
The researchers tested their method using a robotic arm trained to scoop and transport marbles. After a few guided executions, they let the robot carry out the task independently, nudging it off path and knocking off marbles from its scoop.
Rather than restarting the entire task or continuing without the marbles, the robot adjusted to the disruptors, affirming the success of its completion. Wang concluded that their method eliminates the need for human interference to rectify failures and convert training data into robust robot behavior that can execute complex tasks despite external disruptions.