Skip to content Skip to footer

DeepMind presents SIMA, a versatile AI agent designed for 3D settings.

DeepMind’s Scalable, Instructable, Multiworld Agent (SIMA) is pushing the boundaries of artificial intelligence (AI) as it learns to achieve the human-like understanding of instructions and adaptability in a variety of simulated 3D environments and video games. Unlike traditional AI that excels only in specific tasks, SIMA’s agents have been trained to interpret human language commands and convert them into tangible actions.

The core of the project is an extensive and diversified dataset of human gameplay across numerous research environments and commercial video games. A variety of titles, including popular games like “No Man’s Sky” and “Teardown,” were used to train and test SIMA. The games provided a challenge for SIMA to develop an array of skills varying from basic navigation and resource collection to more complicated activities like crafting and operating a spaceship. Four research environments were also included in the training curriculum of SIMA to assess its object interaction and physical involvement capabilities.

Several key differentiating features set SIMA’s architecture apart from other game-playing AIs. For instance, SIMA is not dependent on source code access or custom APIs. It works based on on-screen imagery and instructions given by the user, utilizing keyboard and mouse actions to perform tasks. SIMA employs pre-trained vision and video prediction models that are fine-tuned to the specific 3D settings of the games it plays.

During its evaluation phase, the agent demonstrated its proficiency in over 600 basic skills, ranging from navigation and object interaction to using menus. However, rather than focusing on mastering a single game or a specific set of problems, DeepMind designed SIMA to be adaptable. The AI agent is trained to understand and to execute instructions across different virtual worlds, making it a truly intelligent AI.

SIMA’s training model designed by DeepMind prioritises adaptability and understanding. It receives its inputs from images provided by a 3D environment and natural language instructions presented by a user. By connecting language with perception and action, SIMA is mastering the art of ‘getting’ our instructions and acting upon them.

This project is moving the field of artificial intelligence a step closer to the ultimate goal of truly intelligent, instructable AI agents, effectively blurring the lines between human and machine understanding.

Leave a comment

0.0/5