Glossary

Genie 2

Genie 2 is a foundation world model released by Google DeepMind on 4 December 2024, succeeding the 2D Genie of February 2024. It generates playable 3D environments in real time from a single image prompt, with the model itself simulating the world's physics and visual response to player actions.

Behaviour. Given any image (a photograph, a painting, a generated frame) and a stream of keyboard or controller inputs, Genie 2 produces video frames consistent with the inputs as if the user were playing a video game inside the image. The model has learnt:

  • Object interaction (collisions, pushing, breaking).
  • Character animation for a controllable avatar.
  • Physics: gravity, water effects, smoke, lighting and reflections.
  • NPCs that move and react.
  • Long-horizon memory: when the player turns away from an object and back, the object is reconstructed consistently for up to roughly a minute of generated play.

Genie 2 is autoregressive at the frame level, conditioned on prior frames and the latest action token. Generation runs in real time on a single accelerator at modest resolution.

Training. The model was trained on a large corpus of video paired with inferred action sequences. Crucially, Genie 2 learns from unlabelled video: action information is recovered by an inverse-dynamics model, so any video footage can serve as training data without action annotations. This is the same insight that powered the original Genie.

Foundational role. DeepMind framed Genie 2 explicitly as a substrate for embodied AI training. An agent can be dropped into a freshly generated environment, take actions, observe consequences, and learn, all inside the model. This addresses one of embodied AI's hardest problems: the scarcity of diverse, safe, controllable training environments. Real-world robot data is expensive and slow; simulators are limited to hand-built worlds; Genie 2 offers an open-ended, image-promptable world generator.

DeepMind demonstrated Genie 2 as a training environment for SIMA, its instruction-following 3D agent, generating novel rooms and tasks for SIMA to learn in. The same approach is being explored by robotics groups for sim-to-real pipelines.

Position. Genie 2 is the most prominent example of the world-model thread in 2024-2026 AI research, alongside OpenAI Sora (video generation, less interactive), Wayve's GAIA (driving worlds) and various open replications. Distinguishing world models from pure video generators: world models accept actions and produce consistent, controllable rollouts; video generators do not.

Genie 2 is research-only as of early 2026, with no public API, but its descendants are widely expected to underpin the next generation of robot training and game-development tooling.

Related terms: Reasoning Model Training, Gemini 2.x

Discussed in:

This site is currently in Beta. Contact: Chris Paton

Textbook of Usability · Textbook of Digital Health

Auckland Maths and Science Tutoring

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).