Glossary

Deep Reinforcement Learning

Also known as: deep RL

Deep reinforcement learning is the combination of deep neural networks with classical reinforcement learning algorithms. The deep network serves as a function approximator for the policy, the value function or both, allowing reinforcement learning to scale to high-dimensional sensory inputs (raw pixels, language tokens, robot sensor readings) that classical tabular RL could not handle.

The defining results of deep RL: DQN (Mnih et al., 2013, 2015), Atari games from pixels. AlphaGo (Silver et al., 2016), Go. AlphaGo Zero / AlphaZero (2017), board games from self-play alone. OpenAI Five (2018), Dota 2. AlphaStar (Vinyals et al., 2019), StarCraft II. GPT post-training with RLHF (2022), language models fine-tuned to be helpful and safe. AlphaFold's Evoformer training, protein structure.

Deep RL inherits classical RL's exploration challenge, the agent must balance exploration (gathering information about the environment) against exploitation (using current knowledge to maximise reward). It adds the function-approximation challenge, how to make Q-learning, policy-gradient and other algorithms numerically stable when the function being learned is a deep network with millions of parameters. The "deadly triad" of function approximation, off-policy learning and bootstrapping continues to make deep RL a substantially harder regime to work in than supervised deep learning.

Standard modern deep-RL algorithms include DQN (and its many variants, Double DQN, Dueling DQN, Rainbow), PPO (the dominant policy-gradient method), SAC (Soft Actor-Critic, the standard for continuous-action problems), and MuZero (model-based, planning in a learned latent space).

Video

Related terms: Reinforcement Learning, DQN, AlphaGo, PPO

Discussed in:

This site is currently in Beta. Contact: Chris Paton

Textbook of Usability · Textbook of Digital Health

Auckland Maths and Science Tutoring

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).