Robots are becoming increasingly adept at handling complex household tasks, from cleaning messes to serving meals. However, their ability to handle unexpected disturbances or difficulties during these tasks has been a challenge. Common scenarios like a nudge or a slight mistake that deviates the robot from its expected path can cause the robot to restart the task from the beginning – a significant barrier to their everyday utility.
Addressing this problem, engineers at MIT are working to equip robots with a form of ‘common sense’ – a capacity to adapt and overcome obstacles without needing to go back to square one. The new approach leverages large language models (LLMs) to break down tasks into manageable ‘subtasks’, enabling robots to respond to issues within these individual steps rather than affecting the overall procedure.
LLMs parse out complex tasks into smaller sequences, allowing the robot to follow a logical order of subtasks to complete the overall task. For instance, in a task like moving marbles from one bowl to another, the LLM would identify ‘reaching’, ‘scooping’, ‘transporting’, and ‘pouring’ as individual actions within the task. If the robot was disrupted during one of these subtasks, it could correct the issue at that point rather than having to start from the beginning again.
Additionally, the MIT team developed an algorithm to map a robot’s physical state (e.g., its position or an image of its current state) to the corresponding subtask, process known as ‘grounding.’ This grounding process forms a bridge between what the robot is physically doing and the LLM’s understanding of the subtasks, allowing the robot to recognize and respond to specific stages of a task.
To demonstrate this approach, a robotic arm was trained to move marbles from one bowl to another, a task that was broken down into subtasks by an LLM. During testing, the robot was subject to intentional disruptions, such as pushing or knocking marbles off its scoop. Despite this interruption, the robot was able to correct the problem at the individual subtask level and successfully complete the task without needing to start from scratch.
This groundbreaking research spells a promising future for training household robots. As Yanwei Wang, a graduate student at MIT’s Department of Electrical Engineering and Computer Science (EECS) states, the method allows robots to self-correct errors and improve task success without the need for explicit programming or additional demonstrations from humans. The team is preparing to present their study at the International Conference on Learning Representations (ICLR) in May.