Model predictive control (MPC), also known as receding-horizon control, is the dominant modern approach to constrained optimal control. At each timestep $t$, MPC solves a finite-horizon optimisation over a sequence of future actions, applies only the first action, observes the new state, and repeats. The "receding horizon" is what makes the method robust: the planner is effectively re-planning every step against the latest measurements.
The optimisation at timestep $t$ is:
$$\min_{u_{t:t+H-1}} \sum_{k=0}^{H-1} L(x_{t+k}, u_{t+k}) + V_f(x_{t+H})$$ $$\text{subject to} \quad x_{t+k+1} = f(x_{t+k}, u_{t+k}), \quad x_{t+k} \in \mathcal{X}, \quad u_{t+k} \in \mathcal{U}$$
Here $H$ is the prediction horizon, $L$ is the running cost (e.g. tracking error plus control effort), $V_f$ is a terminal cost, $f$ is the dynamics model, and $\mathcal{X}, \mathcal{U}$ are state and input constraints (joint limits, actuator saturation, obstacle-avoidance regions). After solving, the controller applies $u_t^*$, the system evolves to $x_{t+1}$, and the optimisation is solved afresh from there.
The form of the optimisation depends on $f$ and the costs. Linear MPC (linear $f$, quadratic $L$, polyhedral constraints) yields a quadratic program at each step, which off-the-shelf QP solvers (OSQP, qpOASES) solve in microseconds; this case is mature, with stability and recursive-feasibility guarantees from the theory of convex optimisation. Non-linear MPC uses sequential quadratic programming or interior-point methods on the non-convex problem; tools include ACADO, FORCES, CasADi, and Acados. Sampling-based MPC, such as MPPI (Williams et al. 2017), avoids gradients entirely by drawing many random action sequences, simulating each through a (possibly learned) dynamics model, and computing a softmax-weighted average; MPPI underlies many recent robotics demos because it handles arbitrarily complex dynamics and costs.
MPC's strengths are explicit handling of constraints (a Kalman filter cannot say "do not exceed 80% torque"), explicit modelling of dynamics (so the controller anticipates rather than reacts), and modular cost design (combine tracking, smoothness, energy, safety in one objective). Its weaknesses are computational cost (each step solves a non-trivial optimisation), reliance on an accurate model (model errors compound over the horizon), and fragility to the optimiser converging to bad local minima in non-convex problems.
Real-world deployments span an enormous range. Industrial process control (refineries, power plants) has used linear MPC for decades. Autonomous vehicles run MPC for trajectory tracking and lane-change planning. Quadrotor flight controllers run non-linear MPC at hundreds of Hertz on embedded processors. SpaceX rockets land using a convex-optimisation-based MPC variant (lossless convexification of the powered descent guidance problem). Boston Dynamics' Atlas uses whole-body MPC for dynamic locomotion. In modern reinforcement learning, MPC with a learned dynamics model is the foundation of model-based RL: the dynamics model takes the place of $f$, and the policy is implicit in the optimiser. The conceptual link to Dreamer-style world models is direct.
Related terms: World Model, Reinforcement Learning, Markov Decision Process
Discussed in:
- Chapter 12: Sequence Models, Robotics and Control