Helix is the humanoid robot model released by Figure AI in February 2025. It is the first publicly disclosed VLA explicitly designed around a two-system architecture inspired by Kahneman's "Thinking, Fast and Slow", and was demonstrated on Figure's 02 humanoid platform performing collaborative kitchen tasks.
Two-system architecture. A naive monolithic VLM running at $\sim$10 Hz cannot drive a 35-DoF humanoid, which needs $\sim$200 Hz control to remain stable. Helix solves this with a frequency split:
System 2: 7B-parameter VLM. Runs at 7–9 Hz on the robot's compute. Takes an RGB image stream and a language instruction; outputs a 512-dimensional latent vector $z_t$ encoding the current sub-goal. Provides semantic understanding, generalisation, and plan adjustment.
System 1: 80M-parameter visuomotor policy. Runs at 200 Hz. Takes raw images, robot proprioception, and the latent $z_t$ from System 2; outputs the next joint-target vector for all 35 DoF. Uses a small transformer trained with behaviour cloning on teleoperated humanoid demonstrations.
Mathematically, the policy decomposes as
$$\pi(a_t \mid o_t) = \pi_{\text{S1}}(a_t \mid o_t, z_{\lfloor t / k \rfloor}), \qquad z_\tau = \pi_{\text{S2}}(o_\tau, \ell)$$
where $k$ is the frequency ratio (about 25 in Helix). The two systems are trained jointly so the latent $z_t$ is meaningful to System 1.
Headline capabilities.
- Two robots collaborating. The first publicly shown humanoid VLA where two robots receiving the same model weights and a shared instruction cooperate on a task (putting groceries away).
- Whole upper-body control. Drives 35 DoF (two 7-DoF arms, two grippers, head, torso) end-to-end from pixels to joint targets.
- Generalisation to novel objects. Picks up household items the model never saw in training, by leveraging the VLM's semantic prior.
Training data. Figure has not disclosed exact figures. Public statements indicate $\sim$500 hours of teleoperated humanoid data, a small dataset by autonomous-driving standards but large for humanoids. The VLM component is initialised from a pretrained open-source backbone.
Significance. Helix demonstrated that the slow-VLM/fast-policy split (also seen in $\pi_0$'s separate VLM backbone and action expert, but with looser coupling in Helix) is a practical answer to the latency problem in humanoid control. As of 2025 it is one of three frontier humanoid VLAs publicly demonstrated, alongside Gemini Robotics (Apptronik Apollo) and Tesla's Optimus FSD-derived stack.
Related terms: Embodied AI, RT-1 and RT-2, Gemini Robotics, Pi-Zero, OpenVLA, Vision-Language Model
Discussed in:
- Chapter 16: Ethics & Safety, Embodied AI