Google DeepMind’s Genie is a unique generative model capable of translating plain images or text prompts into dynamic, interactive worlds. Trained on over 200,000 hours of in-game video footage, Genie captures and generates the physics, aesthetics, and dynamics of multiple environments and objects. The completed model, documented in a research paper, houses over 11 billion parameters to generate these interactive virtual environments.
Genie’s unique functionality can turn photos of your living room or garden into playable 2D platform game levels. Users can feed Genie with images or hand-drawn sketches, which are then transformed into interactive environments. By offering latent actions at each time step, users can guide Genie’s output. The model’s unique features come into play allowing frame-by-frame interaction with the generated environments.
Genie’s mechanism comprises three key components: a spatiotemporal video tokenizer, an autoregressive dynamics model, and a Latent Action Model (LAM). Spatiotemporal transformers stand out as central elements to Genie’s function, given their ability to process video frames sequences over time. Genie’s Latent Action Model infers the possible actions amongst video frames, hence controlling the interactive environments’ events. In action, Genie compresses raw video frames into discretized tokens via a video tokenizer, which paves the way for the dynamics model to predict subsequent frames.
Cognizant of the potential of Genie, the DeepMind team acknowledged its potential in enabling individuals to generate personalized game-like experiences. In the world of robotics, Genie’s potential was showcased when tasked with deciphering actions performed by real robot arms interacting with real-world subjects.
However, despite the model’s impressive feats, DeepMind is well aware of the potential misuse of this innovation. To mitigate any adverse risks, the trained model checkpoints, the model’s training dataset, or examples of such data were not released alongside the research paper or website.
This technology aligns with DeepMind’s goal of developing gaming-based projects that simulate real-world applications such as the XLand project and SIMA. These projects, using video games, have offered an unprecedented sandbox for training and testing AI models. Hence, DeepMind continues its exploration of AI and gaming, fostering creativity and innovation, while carefully deliberating the potential risks and benefits.