Glossary

Reinforcement Learning

Also known as: RL

Reinforcement Learning (RL) is the paradigm in which an agent learns to make decisions by interacting with an environment. At each time step, the agent observes a state, selects an action, and receives a scalar reward and a new state. The agent's goal is to learn a policy—a mapping from states to actions—that maximises cumulative reward over time. Unlike supervised learning, there are no correct input–output pairs; the agent must discover good behaviour through trial and error, balancing exploration (trying new actions to gather information) against exploitation (choosing actions known to be rewarding).

The mathematical foundation of RL is the Markov decision process (MDP), which formalises sequential decision making under uncertainty. Classical algorithms include value iteration, policy iteration, Q-learning, and SARSA, all rooted in dynamic programming and the Bellman equation. Modern deep reinforcement learning combines these ideas with neural network function approximators, enabling RL to scale to high-dimensional state spaces such as raw pixels.

RL has produced some of the most dramatic demonstrations of AI capability: DeepMind's DQN learned to play Atari games from raw pixels; AlphaGo and AlphaZero defeated world champions at Go, chess, and shogi; and RLHF (reinforcement learning from human feedback) is now central to training helpful, harmless large language models. RL is the natural framework for robotics, game playing, portfolio management, and any task in which the consequences of decisions unfold over time.

Related terms: RLHF, Agent

Discussed in:

Also defined in: Textbook of AI