Robotics have evolved significantly since its inception, with robots now being utilised across a myriad of industries, such as home monitoring, electronics, nanotechnology, and aerospace. Robots can process complex, high-dimensional data and determine the best possible actions. This is achieved through abstraction, which are condensed summaries of their observations and potential actions, allowing them to generalise across tasks. Researchers are increasingly focusing on learning these abstractions from data instead of manually crafting them.
A team of researchers at Microsoft has recently concentrated on temporal action abstractions; breaking down complex instructions into simpler tasks such as picking up objects or walking. The team believes this method holds promise for action representation learning. As part of this work, the researchers introduced Primitive Sequence Encoding (PRISE), a method designed to teach robots multi-step skills. The results demonstrated that PRISE allows robots to learn faster and with better performance than traditional approaches which train on all action codes simultaneously.
Drawing inspiration from Natural Language Processing (NLP), PRISE converts continuous robot actions into discrete codes. A vector quantization module is initially pre-trained for this, after which the Byte Pair Encoding (BPE) technique (commonly used in NLP for text compression) is applied to identify smaller routines within the action codes. Behaviour Cloning is then used to test the robot, utilising these condensed routines in lieu of the full set of instructions. This results in a process that is quicker and more efficient than other methods.
The researchers also evaluated the effectiveness of PRISE’s skill tokens. These tokens were initially pre-trained using large-scale, multitask offline datasets before being evaluated in two offline imitation learning scenarios; a multitask generalist policy and adaption to unseen tasks. The success rate of the first scenario was measured against 90 tasks in the LIBERO-90 dataset. The use of skill tokens in PRISE resulted in considerable improvement in performance when compared to other algorithms.
In terms of the unseen MetaWorld tasks, PRISE outperformed all other baselines, demonstrating its effectiveness in adapting to unseen downstream tasks. In summary, this research paper unveiled a new method, PRISE, that allows robots to learn and perform more efficiently when compared to traditional methods. The research leveraged NLP methodology called Byte Pair Encoding tokenization algorithm, resulting in efficient learning and adaptability to new tasks. The superiority of this method over other algorithms was also confirmed through the study, suggesting its potential to enhance robot performance across various tasks. The credit for this research is given to Microsoft’s research team.