Reinforcement Learning Lecture Series, Courses, Textbook of AI

Your progress in this browser

Lectures · 0 / 12 watched

Quiz · 0 / 6 correct

Progress is stored in this browser only — there is no account, no login, and no database. Clearing your browser data will reset it.

About the course

The DeepMind × UCL series, taught in 2021 by Hado van Hasselt and colleagues, is the successor to David Silver's classic 2015 UCL course. It is the cleanest currently-available presentation of reinforcement learning, going from the mathematical foundations (MDPs, dynamic programming, the Bellman equations) through model-free control (Q-learning, SARSA, policy gradients) to deep RL methods (DQN, A3C, AlphaGo-style search). The lecturers were all part of the research group that produced those results, so the historical commentary is first-hand.

The series assumes a working knowledge of probability and the basics of deep learning — read our probability and neural-networks chapters first. It pairs naturally with our modern-AI chapter, where we discuss RLHF, which is the most visible application of these ideas to large language models in 2024–2025.

Self-assessment

A short multi-choice quiz. Click an option to commit; the correct answer and an explanation appear. Your answers are remembered in this browser.

Question 1. An agent's discounted return from time $t$ is $G_t = \sum_{k=0}^{\infty} \gamma^k R_{t+k+1}$. The discount factor $\gamma$ being $< 1$ ensures:
Question 2. The Bellman optimality equation for the action-value function $Q^*$ is:
Question 3. Q-learning is off-policy because:
Question 4. The deadly triad in RL is the combination of:
Question 5. DQN combines Q-learning with a deep network. Two stability tricks that make this work are:
Question 6. The policy-gradient theorem says that the gradient of the expected return with respect to the policy parameters $\boldsymbol{\theta}$ is:

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).

Textbook of AI

Reinforcement Learning Lecture Series

About the course

Watch the lectures

Syllabus

Self-assessment