Llama 3 / 3.1 / 3.3, Glossary, Textbook of AI

Llama 3 is the third major generation of Meta's open-weights language model family. Llama 3 launched in April 2024 with 8B and 70B variants, Llama 3.1 in July 2024 added the 405B flagship and extended context to 128K tokens, and Llama 3.2 (September 2024) and Llama 3.3 (December 2024) added multimodal and efficiency-focused variants. The sequence was Meta's largest investment in open-weights AI to date and shaped the open ecosystem from 2024 through early 2026.

Sizes and training.

Llama 3 8B / 70B were trained on roughly 15 trillion tokens of public text and code, an order of magnitude more than Llama 2.
Llama 3.1 405B is a dense (non-MoE) transformer trained on 15.6 trillion tokens using 16,000 H100 GPUs for several months. Meta's technical report disclosed the training infrastructure in unusual detail, including failure rates, scheduler design and parallelism strategy.
Llama 3.2 1B / 3B are small on-device variants distilled from larger models.
Llama 3.2 11B / 90B Vision add image understanding via a separately trained vision encoder bolted onto the language model.
Llama 3.3 70B is a December 2024 update of the 70B with substantially improved post-training, narrowing the gap to the 405B at a fraction of the inference cost.

Architecture. Standard decoder-only transformer with rotary position embeddings (RoPE), grouped-query attention, SiLU activations and a tokeniser of 128K vocabulary entries. The 405B uses 126 transformer layers, 16,384 hidden dimensions, and 128 attention heads. No mixture-of-experts: Meta deliberately chose dense scaling for ease of deployment and reproducibility.

Performance. Llama 3.1 405B was the first open-weights model to reach GPT-4-class performance on standard benchmarks (MMLU, HumanEval, GSM8K, MATH). Llama 3.3 70B is competitive with GPT-4o on many tasks at a fraction of the size. The 8B model runs comfortably on a laptop; the 1B and 3B models run on phones.

Licensing. Llama 3 ships under the Llama Community License, which allows commercial use with two notable restrictions: companies with more than 700 million monthly active users must request a separate licence, and downstream products must include "Built with Llama" attribution. The licence is more permissive than its critics suggest but is not OSI-approved open-source; Llama 3 weights are best described as open weights, not open source.

Significance. The 405B release in particular reset expectations about how good open-weights models could be. It enabled a wave of self-hosting, on-premises deployment for regulated industries, fine-tuning research, and academic study. Llama 3 is the substrate for thousands of derivative models on Hugging Face, including domain-specialised medical, legal and code models. Together with DeepSeek-V3 and Qwen 2.5/3 it defines the open-weights frontier as of early 2026, even as the gap to closed frontier models has narrowed but not disappeared.

Video

Related terms: Transformer, Mixture of Experts, DeepSeek-V3, RLHF

Discussed in:

Chapter 15: Modern AI, Modern AI

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).