Machine Learning, Courses, Textbook of AI

Your progress in this browser

Lectures · 0 / 15 watched

Quiz · 0 / 8 correct

Progress is stored in this browser only — there is no account, no login, and no database. Clearing your browser data will reset it.

About the course

CS229 is the course that taught a generation of machine-learning practitioners their craft. Andrew Ng's autumn 2018 Stanford lectures are the most widely watched version: twenty hours of whiteboard, plus problem-set discussions, covering supervised learning (linear and logistic regression, GLMs, SVMs, decision trees, neural networks), learning theory (bias-variance, VC dimension, regularisation), unsupervised learning (k-means, mixtures of Gaussians, EM, PCA, ICA), reinforcement learning, and a quick tour of deep learning before it had eaten the field.

The course is mathematical without being daunting. Ng works derivations live, says out loud where the linear-algebra identity comes from, and is honest about which claims are heuristic. If you have read our linear algebra, calculus, probability, and ML-fundamentals chapters and want to see those tools applied end-to-end by a careful expositor, this is the course to watch.

Note: this is the 2018 cohort. The course at Stanford has since been split into CS229, CS230 (deep learning), CS231n (vision), and CS236 (generative models), each of which goes deeper into its area. Ng's 2018 version remains the best single starting point because it shows the connective tissue between them.

Self-assessment

A short multi-choice quiz. Click an option to commit; the correct answer and an explanation appear. Your answers are remembered in this browser.

Question 1. In ordinary least squares with design matrix $X$ and target $\mathbf{y}$, the normal-equation solution is:
Question 2. The logistic-regression loss is the negative log-likelihood under which assumed distribution for $y \mid \mathbf{x}$?
Question 3. A discriminative model directly estimates:
Question 4. In a soft-margin SVM, the slack variable $\xi_i$ represents:
Question 5. Bias-variance: a high-bias / low-variance model on a fixed dataset typically:
Question 6. The EM algorithm for a mixture of Gaussians is best described as:
Question 7. PCA's principal components are the eigenvectors of:
Question 8. In a Markov decision process, the Bellman equation for the optimal value function $V^*(s)$ is:

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).

Textbook of AI

Machine Learning

About the course

Watch the lectures

Syllabus

Self-assessment