Glossary

Gemini Robotics

Gemini Robotics is the family of robot foundation models released by Google DeepMind in March 2025, with a follow-up Gemini Robotics 1.5 in mid-2025. It is the spiritual successor to RT-2, now built on the Gemini 2.0 backbone instead of PaLM-E or PaLI-X.

Two-model design.

  1. Gemini Robotics. The action model. A fine-tuned Gemini 2.0 with an action head that emits 7-DoF or 14-DoF (bimanual) actions at $\sim$50 Hz. Uses an action representation similar to OpenVLA (discrete bins as text tokens) plus continuous-output heads for fine motor control.

  2. Gemini Robotics-ER (Embodied Reasoning). A reasoning model used for high-level planning and spatial understanding. It does not emit actions directly; instead, it answers questions about scenes ("which object should I pick up next?", "where is the door handle?"), localises objects with bounding boxes, and predicts grasps and trajectories as 2D/3D coordinates that downstream low-level controllers execute.

A typical pipeline runs Gemini Robotics-ER for the plan and waypoints, then Gemini Robotics for closed-loop motor control.

Capabilities demonstrated.

  • Folding origami, packing lunchboxes, playing card games (dexterous manipulation).
  • Cross-embodiment generalisation: a single weight set drives ALOHA bimanual setup, Apptronik humanoid, and Franka arms.
  • Long-horizon tasks through Gemini's chain-of-thought: "make a salad" decomposes into reach, grasp, slice, plate, repeat.
  • Safety. Constitutional-AI-style safety training to refuse unsafe actions (e.g. stop if a human enters the workspace).

Architectural notes. Google has not published the full architecture. Public statements describe a Gemini 2.0 transformer with a "robotics action expert" head, and indicate that the model is co-trained on robot trajectories and the original Gemini multimodal corpus, preserving language and reasoning. The Gemini Robotics paper (March 2025) reports state-of-the-art results on the ASIMOV safety benchmark and a new dexterous manipulation suite.

Apptronik partnership. Gemini Robotics is the policy used in Apollo, Apptronik's humanoid robot, in a commercial deployment partnership announced alongside the model release.

Significance. Gemini Robotics is the first frontier robot foundation model from a major AI lab built on a current-generation multimodal LLM (rather than a 2-year-old base like PaLM-E was). Its release signalled that frontier robotics is now a first-class workload at Google DeepMind, alongside language and search.

Related terms: RT-1 and RT-2, Embodied AI, Gemini Multimodal, Gemini 2.x, OpenVLA, Pi-Zero, Helix

Discussed in:

This site is currently in Beta. Contact: Chris Paton

Textbook of Usability · Textbook of Digital Health

Auckland Maths and Science Tutoring

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).