Applications: 17.3   Autonomous vehicles

Dr Chris Paton

17.3 Autonomous vehicles

Autonomous driving is the application that has consumed the most capital with the most equivocal returns. Cumulative investment from 2015 to 2025 is estimated at well over $200 billion. Robotaxi services do operate at limited scale in 2026, Waymo carries around 500,000 paid rides per week (April 2026), with a public goal of 1 million per week by year end and a fleet of roughly 3,000 robotaxis across Phoenix, San Francisco, Los Angeles and Austin, and Baidu's Apollo Go runs in over a dozen Chinese cities, but the original vision of widespread Level 4 autonomy on consumer vehicles has not arrived.

The standard stack

Autonomous-driving systems decompose conventionally into four modules.

Perception detects and tracks objects in the vehicle's environment. Inputs are some combination of cameras (8 to 12 in modern systems), lidar (rotating or solid-state, range 100 to 300 metres), radar (forward and corner-mounted, all-weather), ultrasonics for parking, and high-precision GPS plus inertial measurement units for ego-state. Outputs are typically a list of detected objects with positions, velocities and class labels (vehicle, pedestrian, cyclist, traffic sign), together with a drivable-surface segmentation and the road-network topology around the vehicle. Architectures have moved over the last five years from per-camera 2D detection followed by fusion, to bird's-eye-view (BEV) representations computed end-to-end. BEVFormer (Li and colleagues, ECCV 2022) and the related BEVFusion (2022) project multi-camera features into a top-down grid using transformer attention; this representation is more naturally consumed by downstream prediction and planning.

Prediction forecasts the motion of other agents over the next few seconds. Models are typically multimodal: rather than predicting a single trajectory per agent, they output a set of plausible futures with associated probabilities. Architectures include MultiPath (Waymo, 2019), VectorNet (2020), Wayformer (2022) and most recently scene-level transformers that reason about agent interactions jointly.

Planning chooses the vehicle's own trajectory given perception and prediction outputs. Classical approaches use sample-based planners (RRT, hybrid A*) or optimisation-based methods (model-predictive control over a polynomial trajectory representation). Recent learned approaches include UniAD (Hu and colleagues, CVPR 2023 best paper) and Waymo's MotionLM, which integrate perception, prediction and planning into a single transformer trained end-to-end on driving demonstrations.

Control translates the planned trajectory into steering, throttle and brake commands, using PID or MPC controllers. This is typically the least researched module because it works.

Tesla FSD versus Waymo: two philosophies

The two leading approaches embody different bets about what autonomy requires.

Tesla has pursued vision-only autonomy since dropping radar in 2021 and ultrasonic sensors in 2022. Full Self-Driving (FSD) v12, released in 2024, replaced Tesla's hand-coded planning logic with an end-to-end neural network trained on what Tesla calls billions of miles of driving video from the consumer fleet. FSD v13 (December 2024) extended this to highway driving. FSD operates as a Level 2 advanced driver-assistance system in regulatory terms, the driver is required to remain attentive. Tesla's Robotaxi service, launched in Austin in June 2025 using Model Y vehicles with safety operators, is the company's first commercial autonomous-ride product; the purpose-built two-seater Cybercab unveiled in October 2024 is targeted for production in 2026.

The Tesla bet is that data scale plus a vision-only sensor suite plus end-to-end neural-network policy will reach human-level safety. The advantages are cost (cameras are cheap), production-vehicle sensor calibration, and the ability to leverage the consumer fleet for data collection. The disadvantages are lack of robust depth estimation in adverse conditions and the absence of redundant modalities for safety-critical scenarios.

Waymo uses a sensor suite of 4 lidars, 6 radars and 29 cameras on its fifth-generation Driver platform, with extensive use of HD maps. Its planning stack remains more modular than Tesla's, with explicit prediction and planning modules. Waymo operates at Level 4 in defined operational design domains (ODDs) within Phoenix, San Francisco, Los Angeles and Austin, with no safety driver in the vehicle. As of early 2026 Waymo had logged well over 50 million autonomous miles in commercial service.

The Waymo bet is that redundant sensing, careful ODD restriction, and gradual expansion produce the safety record needed for deployment. The advantages are a much lower disengagement rate and a service that actually operates without a human in the vehicle. The disadvantages are high vehicle cost (the sensor suite alone is reportedly tens of thousands of pounds) and the operational expense of HD maps.

The disengagement rate

The standard public metric is the disengagement rate, the number of times per mile that a human safety driver takes control. California requires testing companies to file annual disengagement reports with the DMV. In 2023, Waymo reported approximately 17,000 miles per disengagement on its testing fleet; in 2024 the figure rose above 25,000. Tesla, which operates FSD as a consumer ADAS rather than as Level 4 testing, does not file such reports and the comparable figures from third-party testing (the Tesla Owners Club, Mobileye independent assessments) are roughly 100 to 1,000 miles per disengagement, with substantial variance by route and conditions.

The disengagement metric is imperfect. Companies define disengagements differently, the comparator depends on the test environment, and "disengagement" undercounts near misses. Nonetheless, the order-of-magnitude gap between Waymo's commercial fleet and Tesla's consumer ADAS reflects the difference between a Level 4 service in restricted ODDs and a Level 2 system that is the responsibility of the driver.

The simulation flywheel

A common element across mature programmes is heavy reliance on simulation. Real-world miles are slow and expensive; simulated miles are fast and cheap. The pattern is to mine real driving data for "interesting" scenarios: near-collisions, unusual agent behaviour, rare road geometries. Replay them in simulation with the autonomy stack under test, generate variants by perturbing the agents' behaviours, and use these for both regression testing and reinforcement-learning fine-tuning. Waymo's Carcraft, Tesla's simulation pipeline, NVIDIA Drive Sim and the open-source CARLA all serve this purpose.

The simulation–reality gap remains substantive. Lighting, weather, sensor noise, and especially the behaviour of other agents are imperfectly modelled. Closing this gap is an active research area (see Section 17.4 on robotics for related work).

Safety and casualty statistics

Comparing human and autonomous-vehicle safety is harder than headline numbers suggest. The US has roughly 1.3 fatalities per 100 million vehicle miles travelled, with most fatalities in higher-risk conditions (rural roads, late at night, drivers under the influence). Autonomous vehicles have so far been deployed mostly in benign conditions, urban centres, well-lit streets, fair weather, where the human baseline is much lower than the overall average. Comparing Waymo's per-mile safety record against the unconditional US fatality rate would flatter the autonomous system. The fair comparison is against the human baseline in the same conditions. Waymo's published 2024 safety report, covering 22 million rider-only miles to the end of June 2024, claimed a 73% reduction in injury-causing crashes and an 84% reduction in airbag-deployment crashes versus the human baseline in its operational areas; an updated 2025 analysis (over 56.7 million rider-only miles to January 2025) reported even larger reductions. Independent verification by RAND Corporation has produced broadly consistent estimates. Tesla's published Vehicle Safety Report metrics use an idiosyncratic comparator that has been widely critiqued. The 2018 Uber ATG fatality in Tempe, Arizona (Elaine Herzberg) and the various Tesla Autopilot fatalities have shaped the regulatory landscape and the public-trust environment.