CS231n · Stanford University · 2017

Convolutional Neural Networks for Visual Recognition

with Fei-Fei Li, Justin Johnson, Serena Yeung

Official course page →

Your progress in this browser

Lectures · 0 / 12 watched

Quiz · 0 / 5 correct

Progress is stored in this browser only — there is no account, no login, and no database. Clearing your browser data will reset it.

About the course

CS231n taught a decade of vision researchers their craft. The 2017 cohort, taught by Fei-Fei Li, Justin Johnson, and Serena Yeung, is the most widely watched and is the one we link to here. It covers what classical computer vision looked like before deep learning, the rise of the convolutional network from LeNet through AlexNet and ResNet, the details that make backpropagation through a deep CNN actually work (initialisation, batch normalisation, learning-rate schedules), and applications to detection, segmentation, video, and generative models.

The course is famously concrete. Lectures derive backpropagation through a computational graph by hand, walk through ImageNet architecture diagrams layer by layer, and discuss tricks like data augmentation and weight decay with the practical detail you would normally only get in a lab. After our CNNs chapter the next thing to do is to watch this course end to end.

Important caveat: the 2017 cohort predates the vision-transformer turn (ViT, 2020) and the diffusion-model turn (2020–2022). The course remains the right place to learn CNNs from, but you will need the modern-AI chapter and the MIT 6.S191 lectures to bring the picture up to date.

Watch the lectures

Open the full playlist on YouTube →

Syllabus

Tick lectures as you finish them. Your ticks live in this browser only.

  1. Fei-Fei Li

    What computer vision is, the history from David Marr through SIFT to ImageNet. Why object recognition is hard.

  2. Serena Yeung

    The image-classification setup. Why nearest-neighbour fails as the dimensionality grows. The split into train/val/test.

  3. Justin Johnson

    SVM loss, softmax loss, regularisation. SGD with momentum. Stepwise vs end-to-end optimisation.

  4. Justin Johnson

    The computational-graph view of backprop. Vectorised gradients. The vanishing-gradient problem in deep networks.

  5. Justin Johnson

    Convolution, pooling, the canonical conv-relu-pool stack. Receptive fields. Why convolutions match images.

  6. Justin Johnson

    Activation functions, weight initialisation (Xavier, He), batch normalisation. Babysitting the learning process.

  7. Justin Johnson

    Optimisers — momentum, Adagrad, RMSProp, Adam. Learning-rate schedules. Regularisation: dropout, weight decay, data augmentation.

  8. Serena Yeung

    AlexNet, VGG, GoogLeNet, ResNet. The ImageNet leaderboard year by year. Residual connections and why they work.

  9. Justin Johnson

    R-CNN, Fast R-CNN, Faster R-CNN, YOLO. Semantic vs instance segmentation.

  10. Justin Johnson

    Image captioning, visual question answering, RNN + CNN hybrids.

  11. Serena Yeung

    Autoencoders, VAEs, GANs. Image generation and the inception-score / FID metrics.

  12. Serena Yeung

    Deep Q-learning for Atari, policy gradients, application to robotics.

Self-assessment

A short multi-choice quiz. Click an option to commit; the correct answer and an explanation appear. Your answers are remembered in this browser.

  1. Question 1. In a convolutional layer with input volume $W \times H \times C$, kernel size $K \times K$, $F$ filters, stride $S$, and zero-padding $P$, the spatial output size is:

  2. Question 2. Why are residual (skip) connections useful in very deep networks?

  3. Question 3. He initialisation scales weights so that the variance of activations and gradients is preserved across ReLU layers. The scale is:

  4. Question 4. Faster R-CNN improves on Fast R-CNN by:

  5. Question 5. Why is data augmentation (random crops, flips, colour jitter) effective for training image classifiers?

This site is currently in Beta. Contact: Chris Paton

Textbook of Usability · Textbook of Digital Health

Auckland Maths and Science Tutoring

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).