Researchers from Carnegie Mellon University and Google’s DeepMind have developed a novel approach for training visual-language models (VLMs) called In-Context Abstraction Learning (ICAL). Unlike traditional methods, ICAL guides VLMs to build multimodal abstractions in new domains, allowing machines to better understand and learn from their experiences.
This is achieved by focusing on four cognitive abstractions, which include task and causal relationships, changes in object states, temporal abstractions, and task construals. Essentially, ICAL uses this information to optimize the machine’s trajectory, generating relevant verbal and visual abstractions in response to successful or unsuccessful tasks. Human language input serves as further guidance, refining these abstractions.
ICAL’s main advantage over traditional methods is that it captures a comprehensive overview of the tasks, emphasizing dynamics and critical knowledge rather than merely storing successful action plans. Each phase of abstraction generation enhances the model’s execution and abstraction capabilities, utilizing previously derived abstractions. The product is a concise summary of the rules, action sequences, state transitions, and visual representations, all expressed in natural language.
The researchers then tested ICAL on three AI benchmarks: VisualWebArena, TEACh, and Ego4D. The success of the agent, particularly on the TEACh benchmark, demonstrated the effectiveness of ICAL-taught abstractions for in-context learning. Notably, the ICAL method improved goal condition success by 12.6% compared to the previous best, HELPER.
To summarize, the ICAL method has significant potential as it consistently outperforms in-context learning that doesn’t use such abstractions, thus reducing the need for meticulously constructed examples. In addition to its performance, it also competes closely with fully supervised approaches, despite using 639 times less in-domain training data. The research team plans to further investigate challenges and opportunities for improvement in ICAL, such as handling noisy demonstrations and the dependency on a static action API.
With the development of ICAL, there is growing optimism for its future and contribution to the field of AI learning, particularly for VLMs. By incorporating the ways humans accumulate experiences and derive insights, ICAL presents a significant shift in how machine learning could mimic human learning abilities in the future.