Gemini Robotics is the family of robot foundation models released by Google DeepMind in March 2025, with a follow-up Gemini Robotics 1.5 in mid-2025. It is the spiritual successor to RT-2, now built on the Gemini 2.0 backbone instead of PaLM-E or PaLI-X.
Two-model design.
Gemini Robotics. The action model. A fine-tuned Gemini 2.0 with an action head that emits 7-DoF or 14-DoF (bimanual) actions at $\sim$50 Hz. Uses an action representation similar to OpenVLA (discrete bins as text tokens) plus continuous-output heads for fine motor control.
Gemini Robotics-ER (Embodied Reasoning). A reasoning model used for high-level planning and spatial understanding. It does not emit actions directly; instead, it answers questions about scenes ("which object should I pick up next?", "where is the door handle?"), localises objects with bounding boxes, and predicts grasps and trajectories as 2D/3D coordinates that downstream low-level controllers execute.
A typical pipeline runs Gemini Robotics-ER for the plan and waypoints, then Gemini Robotics for closed-loop motor control.
Capabilities demonstrated.
- Folding origami, packing lunchboxes, playing card games (dexterous manipulation).
- Cross-embodiment generalisation: a single weight set drives ALOHA bimanual setup, Apptronik humanoid, and Franka arms.
- Long-horizon tasks through Gemini's chain-of-thought: "make a salad" decomposes into reach, grasp, slice, plate, repeat.
- Safety. Constitutional-AI-style safety training to refuse unsafe actions (e.g. stop if a human enters the workspace).
Architectural notes. Google has not published the full architecture. Public statements describe a Gemini 2.0 transformer with a "robotics action expert" head, and indicate that the model is co-trained on robot trajectories and the original Gemini multimodal corpus, preserving language and reasoning. The Gemini Robotics paper (March 2025) reports state-of-the-art results on the ASIMOV safety benchmark and a new dexterous manipulation suite.
Apptronik partnership. Gemini Robotics is the policy used in Apollo, Apptronik's humanoid robot, in a commercial deployment partnership announced alongside the model release.
Significance. Gemini Robotics is the first frontier robot foundation model from a major AI lab built on a current-generation multimodal LLM (rather than a 2-year-old base like PaLM-E was). Its release signalled that frontier robotics is now a first-class workload at Google DeepMind, alongside language and search.
Related terms: RT-1 and RT-2, Embodied AI, Gemini Multimodal, Gemini 2.x, OpenVLA, Pi-Zero, Helix
Discussed in:
- Chapter 16: Ethics & Safety, Embodied AI