Convolutional Neural Networks for Visual Recognition, Courses, Textbook of AI

Your progress in this browser

Lectures · 0 / 12 watched

Quiz · 0 / 5 correct

Progress is stored in this browser only — there is no account, no login, and no database. Clearing your browser data will reset it.

About the course

CS231n taught a decade of vision researchers their craft. The 2017 cohort, taught by Fei-Fei Li, Justin Johnson, and Serena Yeung, is the most widely watched and is the one we link to here. It covers what classical computer vision looked like before deep learning, the rise of the convolutional network from LeNet through AlexNet and ResNet, the details that make backpropagation through a deep CNN actually work (initialisation, batch normalisation, learning-rate schedules), and applications to detection, segmentation, video, and generative models.

The course is famously concrete. Lectures derive backpropagation through a computational graph by hand, walk through ImageNet architecture diagrams layer by layer, and discuss tricks like data augmentation and weight decay with the practical detail you would normally only get in a lab. After our CNNs chapter the next thing to do is to watch this course end to end.

Important caveat: the 2017 cohort predates the vision-transformer turn (ViT, 2020) and the diffusion-model turn (2020–2022). The course remains the right place to learn CNNs from, but you will need the modern-AI chapter and the MIT 6.S191 lectures to bring the picture up to date.

Self-assessment

A short multi-choice quiz. Click an option to commit; the correct answer and an explanation appear. Your answers are remembered in this browser.

Question 1. In a convolutional layer with input volume $W \times H \times C$, kernel size $K \times K$, $F$ filters, stride $S$, and zero-padding $P$, the spatial output size is:
Question 2. Why are residual (skip) connections useful in very deep networks?
Question 3. He initialisation scales weights so that the variance of activations and gradients is preserved across ReLU layers. The scale is:
Question 4. Faster R-CNN improves on Fast R-CNN by:
Question 5. Why is data augmentation (random crops, flips, colour jitter) effective for training image classifiers?

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).

Textbook of AI

Convolutional Neural Networks for Visual Recognition

About the course

Watch the lectures

Syllabus

Self-assessment