Human-robot interaction presents numerous challenges, including that of equipping robots with human-like expressive behavior. Traditional rule-based methods require scalability in new social contexts, while data-driven approaches are limited by the need for specific, wide-ranging datasets. As the diversity of social interactions increases, the need for more flexible, context-sensitive solutions intensifies.
Generating socially acceptable behaviors for robots and virtual humans has involved rule-based, template-based, and data-driven methods. Rule-based methods depend upon formalized rules, but they lack expressivity and multimodal abilities. Template-based methods shape interaction patterns from human traces, although they also lack expressivity. Meanwhile, data-driven models, which use machine learning or generative models, are constrained by data inefficiency and demand specialized datasets. Large Language Models (LLMs) have shown the potential for tasks like planning and reacting, facilitating social reasoning, and deducing user preferences.
In recent research by Google Deepmind and the University of Toronto, a new method called Generative Express Motion (GenEM) has been proposed. GenEM utilizes LLMs to generate expressive robot behaviors, applying the rich social context provided by LLMs to create adaptable and composable robot motion. GenEM uses a process called few-shot chain-of-thought prompting to convert human language instructions into parameterized control code using the robot’s learned skills. The behaviors are then developed in stages, starting from the user instructions and concluding with the executable code for the robot. The approach outperforms traditional rule-based and data-driven methods by delivering improved expressivity and adaptability.
Two user studies were conducted to assess GenEM, comparing the generated behaviors to those made by a professional animator. User feedback further enhanced the policy parameters and created new expressive behaviors by amalgamating existing ones. The studies revealed that the behaviors generated by GenEM were perceived as competent and understandable. The efficacy of the approach was confirmed by simulation experiments performed using a mobile robot and a simulated quadruped where it performed better than directly translating language instructions into code.
In conclusion, GenEM offers a significant advancement in robotics, demonstrating the success of Large Language Models in producing expressive, adaptable, and composable robot behaviors independently. This progress enhances the potential of LLMs in robotics, emphasizing their role in fostering effective human-robot interactions through autonomous expressive behavior generation.