CS229 · Stanford University · 2018
Machine Learning
with Andrew Ng
Your progress in this browser
Lectures · 0 / 15 watched
Quiz · 0 / 8 correct
Progress is stored in this browser only — there is no account, no login, and no database. Clearing your browser data will reset it.
About the course
CS229 is the course that taught a generation of machine-learning practitioners their craft. Andrew Ng's autumn 2018 Stanford lectures are the most widely watched version: twenty hours of whiteboard, plus problem-set discussions, covering supervised learning (linear and logistic regression, GLMs, SVMs, decision trees, neural networks), learning theory (bias-variance, VC dimension, regularisation), unsupervised learning (k-means, mixtures of Gaussians, EM, PCA, ICA), reinforcement learning, and a quick tour of deep learning before it had eaten the field.
The course is mathematical without being daunting. Ng works derivations live, says out loud where the linear-algebra identity comes from, and is honest about which claims are heuristic. If you have read our linear algebra, calculus, probability, and ML-fundamentals chapters and want to see those tools applied end-to-end by a careful expositor, this is the course to watch.
Note: this is the 2018 cohort. The course at Stanford has since been split into CS229, CS230 (deep learning), CS231n (vision), and CS236 (generative models), each of which goes deeper into its area. Ng's 2018 version remains the best single starting point because it shows the connective tissue between them.
Watch the lectures
Syllabus
Tick lectures as you finish them. Your ticks live in this browser only.
-
Andrew Ng
What is supervised learning. Hypothesis class, cost function. Batch and stochastic gradient descent. Normal equations.
-
Andrew Ng
Locally weighted regression. Logistic regression and its connection to the Bernoulli distribution. Newton's method for logistic regression.
-
Andrew Ng
Exponential family distributions, the GLM recipe, softmax regression as multi-class GLM.
-
Andrew Ng
Gaussian discriminant analysis, naive Bayes, Laplace smoothing. The discriminative vs generative split.
-
Andrew Ng
Margin geometry, the optimisation problem, kernels, the kernel trick. SMO.
-
Andrew Ng
The bias-variance decomposition, the union bound, VC dimension. Why uniform convergence works.
-
Andrew Ng
Cross-validation, $L_1$ and $L_2$ regularisation, feature selection.
-
Andrew Ng (guest)
CART, random forests, AdaBoost. Why ensembles work.
-
Andrew Ng
Feed-forward networks, backpropagation, vanishing gradients, ReLU. Convolutional layers.
-
Andrew Ng
Hard clustering vs soft clustering. The EM algorithm — derivation as coordinate ascent on a lower bound.
-
Andrew Ng
PCA from the variance-maximisation view and from the reconstruction-error view. The connection to the SVD.
-
Andrew Ng
ICA — non-Gaussian sources, cocktail-party problem. Why Gaussian factors are unidentifiable.
-
Andrew Ng
Markov decision processes, Bellman equations, value iteration, policy iteration.
-
Andrew Ng
Continuous-state MDPs, value-function approximation. Linear-quadratic regulators.
-
Andrew Ng
What to debug when. The bias-variance triage. When to collect more data vs change the model.
Self-assessment
A short multi-choice quiz. Click an option to commit; the correct answer and an explanation appear. Your answers are remembered in this browser.
-
Question 1. In ordinary least squares with design matrix $X$ and target $\mathbf{y}$, the normal-equation solution is:
-
Question 2. The logistic-regression loss is the negative log-likelihood under which assumed distribution for $y \mid \mathbf{x}$?
-
Question 3. A discriminative model directly estimates:
-
Question 4. In a soft-margin SVM, the slack variable $\xi_i$ represents:
-
Question 5. Bias-variance: a high-bias / low-variance model on a fixed dataset typically:
-
Question 6. The EM algorithm for a mixture of Gaussians is best described as:
-
Question 7. PCA's principal components are the eigenvectors of:
-
Question 8. In a Markov decision process, the Bellman equation for the optimal value function $V^*(s)$ is:
This site is currently in Beta. Contact: Chris Paton
Textbook of Usability · Textbook of Digital Health
Auckland Maths and Science Tutoring
AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).