Tesla FSD, Glossary, Textbook of AI

Tesla FSD (Full Self-Driving) is the autonomous-driving software stack that ships in Tesla vehicles, notable in the industry for its commitment to a vision-only sensor suite (eight surround cameras, no lidar, no radar after 2021) and its progressive shift from a modular pipeline to a near-end-to-end neural network from camera pixels to vehicle controls.

Early Autopilot (2014–2018) was a classical perception → planning stack with hand-coded fusion. Around 2019 Tesla introduced HydraNet: a single shared image backbone whose features fan out into many task-specific heads (lane lines, traffic lights, stop signs, vehicles, pedestrians, cones, drivable space, depth). HydraNet allowed shared representation across tasks but the downstream planner remained a C++ rule-based system. The next major redesign (2021) introduced the Bird's-Eye-View (BEV) network: per-camera features were lifted into a top-down vehicle-frame representation by a transformer with learned positional queries, giving the planner a unified spatial map without explicit calibration code.

The 2022 architecture moved further: Occupancy Networks replaced bounding-box detection with a dense 3D voxel grid in which each voxel carries an occupancy probability, semantic class and velocity vector (the occupancy flow head). This lets the system reason about general "stuff in space", debris, dropped mufflers, unusual vehicles, without enumerating object classes. Trained end-to-end against video clips with 4D auto-labels (geometry triangulated across time-synchronised camera streams), the occupancy network supplies the planner with a much richer scene than 2D bounding boxes.

FSD v12 (2024) made the most consequential leap: the rule-based planner was largely replaced by an end-to-end neural policy mapping image inputs to control outputs. The planner had previously contained roughly 300 000 lines of hand-written C++; v12 reportedly deleted around two-thirds of it. The new policy is trained on millions of clips of human driving, with imitation-learning losses and shadow-mode comparison against the legacy planner. Inference runs on Tesla's bespoke HW3 (and now HW4) FSD computer, two custom 14 nm SoCs with NPUs delivering ~144 TOPS combined, with the redundant duplication serving as a safety architecture.

The vision-only thesis is contested. Proponents argue cameras are sufficient because humans drive with vision and that lidar adds cost, complexity and a discrepancy between training-time and test-time sensors. Critics point to weather degradation, sun glare, low-light pedestrian detection and the fundamental ambiguity of monocular depth, and note that Waymo, Cruise and Mobileye all retain lidar. Empirically Tesla publishes mileage between disengagements and crash statistics; independent verification is limited. As of 2026 FSD is a Level 2 driver-assist system requiring constant supervision, not an autonomous taxi, and has not received approval for unsupervised operation in any jurisdiction.

Video

Discussed in:

Chapter 17: Applications, Autonomous Driving

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).